Module-1
Module-1
NOSQL
BY
SHASHIKALA H K
OUTLINE
Why NoSQL?
The Value of Relational Databases
Getting at Persistent Data, Concurrency, Integration
A (Mostly) Standard Model, Impedance Mismatch,
Application and Integration Databases
Attack of the Clusters
Emergence of NoSQL
Aggregate Data Models: Aggregates: Example of
Relations and Aggregates, Consequences of Aggregate
Orientation, Summarizing Aggregate-Oriented Databases.
More Details on Data Models: Relationships, Schema
less Databases, Materialized Views, Modeling for Data
Access
Introduction to NOSQL
Relational databases have been the default choice for data storage,
especially in the world of enterprise applications
NoSQL, which stands for "Not Only SQL," it is used to describe a broad
category of DBMS that differ from traditional relational databases
(SQL databases) in terms of data model, storage, and processing.
Why NoSQL?
There are several reasons why organizations might choose NoSQL
databases over traditional SQL databases for certain use cases:
1. Schema flexibility: NoSQL databases are often schema-less or
schema-flexible, allowing you to store and manage data without a
predefined schema.
This flexibility is particularly beneficial when dealing with
unstructured or semi-structured data, as it allows for easier adaptation
to changing data requirements.
2. Scalability: Many NoSQL databases are designed to scale horizontally,
meaning they can handle a growing amount of data by adding more
servers to a distributed database system.
This makes NoSQL databases suitable for applications with rapidly
increasing data volumes and traffic.
3. Performance: NoSQL databases are optimized for specific use
cases, providing high performance for certain types of queries and
operations.
They are often designed to handle large amounts of read and write
operations efficiently, making them suitable for applications that
require low-latency data access.
4. Variety of data models: NoSQL databases support various data
models, such as document-oriented, key-value, column-family, and
graph databases.
This diversity allows organizations to choose the most appropriate
model for their specific application needs.
5. Agile development and iteration: The flexibility of NoSQL
databases makes them well-suited for agile development
methodologies.
Developers can quickly iterate on their applications without being
constrained by a rigid schema, making it easier to adapt to changing
requirements.
6. Handling large amounts of unstructured data: NoSQL
databases are often better equipped to handle unstructured or semi-
structured data, which is common in modern applications.
This is particularly useful in scenarios where data doesn't fit neatly
into tables with predefined relationships.
7. Cost-effectiveness: NoSQL databases can be more cost-effective in
certain scenarios, especially when dealing with large-scale distributed
systems.
They can leverage commodity hardware and scale horizontally,
potentially reducing infrastructure costs.
WHY RELATIONAL DATABASES BECAME SO
DOMINANT, AND WHY WE NEED NOSQL?
Advantages
With an application database, only the team using application
needs to know about the database structure, which makes it
much easier to maintain and evolve the schema.
Since the application team controls both the database and the
application code, the responsibility for database integrity can
be put in the application code.
ATTACK OF THE CLUSTERS
Large sets of data in websites appeared as: links, social networks,
activity in logs, mapping data.
To Cope up with with the increase in data and traffic required more
computing resources.
To handle this kind of increase, you have two choices: up or out
1. Scaling up implies:
bigger machines
more processors
more disk storage
more memory
Scaling up disadvantages:
Bigger machines get more and more expensive.
There are real limits as size increases.
2. Use lots of small machines in a cluster:
The alternative is to use lots of small machines in a cluster.
cluster of small machines uses cheaper hardware.
more resilient—while individual machine failures are common, the overall
cluster can be built to keep going despite such failures, providing high
reliability.
Cluster disadvantages
Relational databases are not designed to be run on clusters.
Relational databases could also be run as separate servers for different sets
of data, effectively sharding the database.
Even though this separates the load, all the sharding has to be controlled by
the application which has to keep track of which database server to talk to for
each bit of data.
The mismatch between relational databases and clusters led
some organization to consider an alternative route to data
storage.
Two companies in particular
1. Google
2. Amazon
Both were running large clusters
They were capturing huge amounts of data
In 2000s, both companies produced brief but highly influential
papers about their efforts: – BigTable from Google – Dynamo
from Amazon
Amazon and Google operate at scales, so they may not be
relevant to an average organization.
THE EMERGENCE OF NOSQL
The term “NoSQL” first made its appearance in the late 90s as the
name of an open-source relational database.
The usage of “NoSQL” that we recognize today traces back to a
meetup on June 11, 2009 in San Francisco.
It is “open-source, distributed, nonrelational databases.”
Some of the NOSQL systems are: Voldemort, Cassandra,
Dynomite, HBase, Hypertable, CouchDB, and MongoDB
Relational databases use ACID transactions to handle consistency
across the whole database.
NoSQL databases offer a range of options for consistency and
distribution
Some of the organizations developed their own systems
Google developed a proprietary NOSQL system known as BigTable,
Apache Hbase and column-based stores
Amazon developed a NOSQL system called DynamoDB
Facebook developed a NOSQL system called Cassandra
CouchDB, which are classified as document-based NOSQL systems
or document stores
graph-based NOSQL systems, or graph databases; these include
Neo4J and GraphBase
OrientDB, combine concepts from many of the categories
CATEGORIES OF NOSQL SYSTEMS
NOSQL systems have been characterized into four major
categories:
1. Document-based NOSQL systems: These systems store data
in the form of documents using JSON.
Documents are accessible via document id or indexes.
2. NOSQL key-value stores: has simple data model based on fast
access by the key to the value associated with the key
3. Column-based or wide column NOSQL systems: These
systems partition a table by column into column families, where
each column family is stored in its own files.
4. Graph-based NOSQL systems: Data is represented as graphs,
and related nodes can be found by traversing the edges using
path expressions.
APPLICATIONS
NoSQL databases are often used in applications where there is a high
volume of data that needs to be processed and analyzed in real-time.
Social media analytics
E-commerce
Gaming
Content management systems
Document management, and
Customer relationship management.
NoSQL databases are used in real-time web applications and big data
and their use are increasing over time.
However, NoSQL databases may not be suitable for all applications, as
they may not provide the same level of data consistency and
transactional guarantees as traditional relational databases.
It is important to carefully evaluate the specific needs of an application
when choosing a database management system.
FEATURES OF NOSQL DATABASES
1. NoSQL databases doesn’t use SQL as a query language. Instead,
database is manipulated through shell scripts that can be combined
into the usual UNIX pipelines.
2. They are generally open-source projects.
3. Most NoSQL databases are driven by the need to run on clusters.
4. Relational databases use ACID transactions to handle consistency
across the whole database. NoSQL databases offer a range of options
for consistency and distribution.
5. Graph databases are one style of NoSQL databases that uses a
distribution model similar to relational databases but offers a different
data model that makes it better at handling data with complex
relationships.
6. NoSQL databases operate without a schema, allows to freely add
fields to database records without having to define any changes in
structure first.
7. NoSQL databases are used in real-time web applications and big
data and their use are increasing over time.
CHARACTERISTICS OF NOSQL SYSTEMS
1. Schema-less or Schema-flexible:
NoSQL databases typically allow for a flexible schema, i.e each
record in a database can have a different set of fields, and new
fields can be added without requiring a predefined schema.
2. Non-relational:
NoSQL databases do not use a fixed schema and are not based
on the traditional tabular structure with rows and columns.
3. Horizontal Scalability:
NoSQL databases are designed to scale horizontally, allowing
them to handle larger amounts of data and increased load by
adding more servers to a distributed database.
4. Distributed Architecture:
Many NoSQL databases are designed to be distributed across
multiple servers or nodes, providing improved performance and
fault tolerance
5. High Performance:
NoSQL databases are optimized for specific types of queries and are
capable of providing high-performance read and write operations.
6. Various Data Models:
NoSQL databases support various data models such as document-
oriented (like MongoDB), key-value stores (like Redis), column-family
stores (like Apache Cassandra), and graph databases (like Neo4j).
7. BASE (Basically Available, Soft state, Eventually consistent):
NoSQL databases often prioritize availability and partition tolerance
over strict consistency. i.e in the event of network partitioning, the
system may continue to operate but may provide inconsistent results
temporarily.
8. Big Data and Unstructured Data Support:
NoSQL databases are often used for handling large volumes of
unstructured or semi-structured data, making them suitable for big
data applications.
9. Simple API:
NoSQL databases typically offer simple APIs for data access and
manipulation, which can be more developer-friendly compared to
SQL-based databases.
10. Open Source:
Many NoSQL databases are open source, allowing for community
collaboration and customization.
11. Availability and Replication
NOSQL systems require continuous system availability.
Two major replication models are used in NOSQL systems are:
master-slave replication: one copy to be the master copy; all write
operations must be applied to the master copy and then propagated
to the slave copies
master-master replication: allows reads and writes at any of the
replicas but may not guarantee that reads at nodes that store
different copies see the same values
12. Sharding of Files
distribute the load of accessing the file records to multiple nodes.
improves load balancing and data availability
14. Versioning
provide storage of multiple versions of the data items, with the
timestamps of when the data version was created
ADVANTAGES OF NOSQL
1. High scalability: NoSQL databases use sharding for horizontal
scaling. horizontal scaling is easy to implement. Ex: horizontal
scaling databases are MongoDB, Cassandra, etc. NoSQL can handle
a huge amount of data because of scalability.
2. Flexibility: NoSQL databases are designed to handle unstructured
or semi-structured data, which means that they can accommodate
dynamic changes to the data model.
3. High availability: The auto, replication feature in NoSQL
databases makes it highly available
4. Scalability: NoSQL databases are highly scalable, i.e. they can
handle large amounts of data and traffic with ease. This makes
them a good fit for applications that need to handle large amounts of
data or traffic
5. Performance: NoSQL databases are designed to handle large
amounts of data and traffic, so they offers improved performance
compared to traditional relational databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective
than traditional relational databases, as they are typically less complex
and do not require expensive hardware or software.
7. Agility: Ideal for agile development.
DISADVANTAGES OF NOSQL
1. Lack of standardization: There are many different types of NoSQL
databases, each with its own unique strengths and weaknesses. This
lack of standardization can make it difficult to choose the right
database for a specific application
2. Lack of ACID compliance: NoSQL databases are not fully ACID-
compliant, i.e. they do not guarantee the consistency, integrity, and
durability of data. This can be a drawback for applications that require
strong data consistency guarantees.
3. Narrow focus: NoSQL databases have a very narrow focus as it is
mainly designed for storage but it provides very little functionality.
Relational databases are a better choice in the field of Transaction
Management than NoSQL.
4. Open-source: NoSQL is an open-source database. There is no reliable
standard for NoSQL yet. In other words, two database systems are
likely to be unequal.
5. Lack of support for complex queries: NoSQL databases are not
designed to handle complex queries, which means that they are not a
good fit for applications that require complex data analysis or
reporting.
6. Lack of maturity: NoSQL databases are relatively new and lack the
maturity of traditional relational databases. This can make them less
reliable and less secure than traditional databases.
7. Management challenge: Data management in NoSQL is much
more complex than in a relational database. NoSQL, in particular, has a
reputation for being challenging to install and even more hectic to
manage on a daily basis.
8. GUI is not available: GUI mode tools to access the database are not
flexibly available in the market.
9. Backup: Backup is a great weak point for some NoSQL databases
like MongoDB.
10. Large document size: Some database systems like MongoDB and
CouchDB store data in JSON format. This means that documents are quite
large.
WHEN SHOULD NOSQL BE USED
When huge amount of data need to be stored and retrieved.
The relationship between data you store is not that important.
The data changing over time and is not structured.
Support of constraint and joins is not required at database level.
The data is growing continuously and you need to scale the database
regular to handle the data.
Flexible Schema Requirements
Scalability and High Throughput
Rapid Development and Agile Iterations
Big Data Applications
High Availability and Fault Tolerance
Real-time Applications(IOT)
IMPORTANT FEATURES OF NOSQL DATABASES
1. Not using the relational model
2. Running well on clusters
3. Open-source
4. Built for the 21st century web applications
5. Schemaless
AGGREGATE DATA MODELS
Data Model: Model through which we identify and manipulate
our data. It describes how we interact with the data in the
database.
Storage model: Model which describes how the database
stores and manipulates the data internally.
In NoSQL “data model” refer to the model by which the
database organizes data more formally called a metamodel.
The dominant data model is relational data model which uses
set of tables:
Each table has rows
Each row representing entity
Column describe entity
Column may refer to relationship
Each NoSQL solution has a different model that it uses:
1. Key-value
2. Document
3. Column-family
4. Graph
Without a schema:
You can easily store whatever you need.
You can easily add new things as you discover them.
A schemaless store makes it easier to deal with nonuniform data: data
where each record has a different set of fields. It allows each record to
contain just what it needs—no more, no less.
GRAPH DATABASES
Graph-based data models
store data in nodes that
are connected by edges.
Aggregate Data Models
in NoSQL are widely
used for storing the huge
volumes of complex
aggregates and
multidimensional data
having many
interconnections between
them.
USE CASES:
Graph-based Data Models are used in social networking
sites to store interconnections.
It is used in fraud detection systems.
This Data Model is also widely used in Networks and IT
operations.
MATERIALIZED VIEWS
materialized views are views that are computed in advance and cached
on disk.
Materialized views are effective for data that is read heavily.
Materialized views are like pre-made summaries of data stored in a
database to save time when repeatedly accessing or analyzing the same
information.
NoSQL databases don’t have views, they may have precomputed and
cached queries
Example: Imagine a grocery store tracking daily sales:
Without materialized views: Every time a manager asks for "total sales
for the day," the system calculates the total by going through all sales
records again.
With materialized views: The system pre-calculates and stores "daily
total sales" at the end of the day. The next time the manager asks, it
retrieves the stored value instantly, saving time.
MODELLING FOR DATA ACCESS
when modeling data
aggregates we need to consider
how the data is going to be
read as well as what are the
side effects on data related to
those aggregates.
In this scenario, the
application can read
customer’s information and all
the related data by using the
key.
If the requirements are to read
orders or products sold in each
order, the whole object has to
be read and then parsed on the
client side to build the results.