No SQL Lecture Notes
No SQL Lecture Notes
Introduction
• When people use the term “NoSQL database,” they typically use it to
refer to any non-relational database.
• Some say the term “NoSQL” stands for “non-SQL” while others say it
stands for “not only SQL.”
• Either way, most agree that NoSQL databases store data in a more
natural and flexible way.
• NoSQL, as opposed to SQL, is a database management approach,
whereas SQL is just a query language, similar to the query languages
of NoSQL databases.
Types of NoSQL databases
• Documented Oriented database
• Key value stores
• Column-oriented database
• Graph database
Document-oriented databases
• A document-oriented database stores data in documents similar to
JSON (JavaScript Object Notation) objects.
• Each document contains pairs of fields and values. The values can
typically be a variety of types, including things like strings, numbers,
Booleans, arrays, or even other objects.
• A document database offers a flexible data model, much suited for
semi-structured and typically unstructured data sets.
• They also support nested structures, making it easy to represent
complex relationships or hierarchical data.
• Example: Mongo DB
Key-value databases
• A key-value store is a simpler type of database where each item
contains keys and values.
• Each key is unique and associated with a single value.
• They are used for caching and session management and provide high
performance in reads and writes because they tend to store things in
memory.
• Examples are Amazon DynamoDB and Redis.
Column-oriented databases
• While an RDBMS stores data in rows and reads it row by row, column-
oriented databases are organized as a set of columns.
• When you want to run analytics on a small number of columns in the
network, you can read those columns directly without consuming
memory with unwanted data.
• Columns are of the same type and benefit from more efficient
compression, making reads even faster.
• A column-oriented database can aggregate the value of a given
column (adding up sales for the year, for example). Use cases of a
column-oriented database include analytics.
Cont.
• While column-oriented databases are great for analytics, the way
they write data makes it difficult for them to be consistent as writes
of all the columns in the column-oriented database require multiple
write events on disk.
• Relational databases don't suffer from this problem as row data is
written contiguously to disk.
Graph databases
• A graph database stores data in the form of nodes and edges.
• Nodes typically store information about people, places, and things
(like nouns), while edges store information about the relationships
between the nodes.
• They work well for highly connected data, where the relationships or
patterns may not be very obvious initially.
• Examples of graph databases are Neo4J and Amazon Neptune.
MongoDB also provides graph traversal capabilities using the
$graphLookup stage of the aggregation pipeline.
Features of No SQL Database
• Distributed computing
• Scaling
• Flexible schemas and rich query language
• Ease of use for developers
• Partition tolerance
• High availability
CAP Theorem
• The CAP theorem, also known as Brewer's theorem, is a fundamental
principle in distributed systems that states it is impossible for a
distributed data store to simultaneously provide all three of the
following guarantees:
• Consistency (C):
• Every read receives the most recent write or an error. In other words, all
nodes in the system see the same data at the same time.
• If a user reads from one node and then immediately reads from another, they
will get the same data.
CAP Theorem Cont.
• Availability (A):
• Every request (read or write) receives a response, even if the response is not
the most recent or accurate version of the data. The system remains
operational and responsive to requests at all times.
• Partition Tolerance (P):
• The system continues to operate despite network partitions or
communication breakdowns between nodes.
• This means that even if some parts of the network are unreachable, the
system as a whole remains functional.
CAP Theorem Cont.
• The CAP theorem asserts that in a distributed data store, you can
achieve at most two of the three guarantees at the same time:
• CA (Consistency + Availability): The system is consistent and available
as long as there is no partition. However, if a partition occurs, one of
these properties must be sacrificed.
• CP (Consistency + Partition Tolerance): The system remains
consistent and tolerates partitions, but may not be fully available
during the partition.
• AP (Availability + Partition Tolerance): The system remains available
and tolerant to partitions, but consistency might be compromised
during network partitions.
RDBMS Vs NoSQL: Data Structure
• Relational Databases:
• Structure: Data is organized into tables (relations) with rows and columns.
• Schema: A predefined schema is required. The structure of the data (i.e., table
columns and their data types) must be defined before any data is entered.
• Normalization: Often normalized to reduce redundancy and ensure data integrity
through relationships (foreign keys).
• NoSQL Databases:
• Structure: Data can be organized in various formats, including key-value pairs,
documents (JSON, BSON), wide-columns, or graphs.
• Schema: Typically schema-less or have a flexible schema, allowing for changes in
data structure without requiring schema modification.
• Normalization: Often denormalized to improve performance by reducing the need
for complex joins.
RDBMS Vs NoSQL: Data Relationships
• Relational Databases:
• Relationships: Use primary and foreign keys to establish relationships
between tables.
• Joins: Data is often retrieved through complex join operations across multiple
tables.
• NoSQL Databases:
• Relationships: Can store nested data structures (especially document-based
databases) but generally avoid complex relationships.
• Joins: Joins are either not supported or are limited, encouraging data to be
stored together when frequently accessed together.
RDBMS Vs NoSQL: Scalability
• Relational Databases:
• Vertical Scaling: Traditionally scale by increasing the resources (CPU, RAM) of
a single server (vertical scaling).
• Horizontal Scaling: Can be challenging and complex, involving sharding or
replication across multiple servers.
• NoSQL Databases:
• Horizontal Scaling: Designed for horizontal scaling, where data is distributed
across multiple servers or nodes easily.
• Elasticity: Easily handle large amounts of unstructured data and high-velocity
data streams by adding more nodes to the cluster.
RDBMS Vs NoSQL: Query Language
• Relational Databases:
• SQL: Use Structured Query Language (SQL) for defining and manipulating
data.
• Standardized: SQL is a standardized language, which means the skills are
transferable across different relational databases.
• NoSQL Databases:
• Varied Query Languages: Depending on the type (e.g., MongoDB uses a
JSON-like query language, Cassandra uses CQL).
• Flexibility: Query languages can be more flexible but lack the standardization
found in SQL.
RDBMS Vs NoSQL: Use Cases
• Relational Databases:
• Transaction Systems: Ideal for applications requiring complex queries and
transactions, such as banking systems, ERP systems, and CRM systems.
• Data Integrity: Suitable for applications where data integrity and ACID
(Atomicity, Consistency, Isolation, Durability) properties are critical.
• NoSQL Databases:
• Big Data and Real-Time Applications: Suitable for handling large volumes of
unstructured or semi-structured data, such as social media platforms, IoT
applications, and real-time analytics.
• Flexible Data Models: Useful in scenarios where the data model is expected
to evolve rapidly, such as content management systems or user profiles.