0% found this document useful (0 votes)
4 views45 pages

NoSQL Lec

NoSQL databases are designed to handle unstructured or semi-structured data, addressing the limitations of traditional relational databases, particularly in scaling and data storage. They come in various types, including key-value, document, column-family, and graph databases, each with unique characteristics and use cases. NoSQL databases are built for horizontal scalability and flexibility, making them suitable for modern web applications and large data volumes.

Uploaded by

mzaheerlion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views45 pages

NoSQL Lec

NoSQL databases are designed to handle unstructured or semi-structured data, addressing the limitations of traditional relational databases, particularly in scaling and data storage. They come in various types, including key-value, document, column-family, and graph databases, each with unique characteristics and use cases. NoSQL databases are built for horizontal scalability and flexibility, making them suitable for modern web applications and large data volumes.

Uploaded by

mzaheerlion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

NoSQL Databases

Overview

• Relational databases and the need for


NoSQL
• Common characteristics of NoSQL databases
• Types of NoSQL database
• Key-value
• Document
• Column-family
• Graph
• NoSQL and consistency
Resources for this section

• NoSQL Distilled (Sadalage & Fowler, 2013)

• Key points from this book are summarised here:

• http://martinfowler.com/articles/nosqlKeyPoints.html
Unlike traditional relational databases
(RDBMS) which use structured query
language (SQL) and are table-based, NoSQL
databases can store and retrieve
unstructured or semi-structured data.
Why is NoSQL needed?
• Relational databases have been a successful technology
for twenty years, providing persistence, concurrency
control, and an integration mechanism

• However, application developers have been frustrated with


the impedance mismatch between the relational model
and the in-memory data structures ( (objects, arrays, etc.

• Impedance mismatch: the effort required to split up an


object (e.g. an order) into separate tables when storing
it in a relational database (e.g. an order might have
elements in the product table, customer table, order line
table etc.) only to have to then put it back together
again (using joins)
Why is NoSQL needed?

• The vital factor for a change in data storage was the


need to support large volumes of data by running on
clusters. Relational databases are not designed to run
efficiently on clusters.

• NoSQL is an accidental neologism. There is no


prescriptive definition—all you can make is an
observation of common characteristics.
NoSQL common characteristics

• The common characteristics of NoSQL databases are:


• Not using the relational model
• Running well on clusters
• Open-source
• Built for the 21st century web
• Schemaless

• No explicit schema is defined; however, the implicit schema

of a NoSQL database must still be managed in some way


the term "running well on clusters" refers to the database's ability to efficiently operate
across multiple servers (nodes) that work together as a single system. This capability is
crucial for handling large-scale data and high-traffic loads by distributing the workload
across many machine
NoSQL databases are generally designed SQL Databases and Clusters
to run well on clusters due to several Traditional SQL (relational) databases can
inherent characteristics: face challenges when running on clusters,
Horizontal Scalability: NoSQL databases primarily due to their design principles:
are built to scale out by adding more Vertical Scalability: SQL databases
servers to handle increased load, rather typically scale vertically, meaning they
than scaling up by adding more power to a improve performance by adding more
single server. This is known as horizontal resources (CPU, memory) to a single
scalability. server rather than distributing the load
Distributed Architecture: Many NoSQL across multiple servers.
databases are designed with a distributed ACID Transactions: SQL databases
architecture from the ground up. Data is prioritize ACID (Atomicity, Consistency,
spread across multiple nodes, and the Isolation, Durability) transactions to
system can continue operating even if ensure data integrity and consistency.
some nodes fail. Ensuring these properties across a
distributed system is complex and can
introduce significant overhead.
NoSQL vs. SQL Comparison

SQL Databases NoSQL Databases


Types One type (SQL database) Many different types including
with minor variations key-value stores, document
databases, wide-column
stores, and graph databases
Development Developed in 1970s to deal Developed in 2000s to deal
History with first wave of data storage with limitations of SQL
applications databases, particularly
concerning scale, replication
and unstructured data
storage

Examples MySQL, Postgres, Oracle MongoDB, Cassandra,


Database HBase, Neo4j
NoSQL vs. SQL Comparison
SQL Databases NoSQL Databases
Data Storage Individual records (e.g., Varies based on NoSQL
Model "employees") are stored as rows in database type. For example, key-
tables, with each column storing a value stores function similarly to
specific piece of data about that SQL databases, but have only
record (e.g., "manager," "date two columns ("key" and "value"),
hired," etc.), much like a with more complex information
spreadsheet. Separate data types sometimes stored within the
are stored in separate tables, and "value" columns. Document
then joined together when more databases do away with the
complex queries are executed. For table-and-row model altogether,
example, "offices" might be stored in storing all relevant data
one table, and "employees" in together in single "document"
another. When a user wants to find in JSON, XML, or another
the work address of an employee, format, which can nest values
the database engine joins the hierarchically.
"employee" and "office" tables
together to get all the information
necessary.
NoSQL vs. SQL Comparison
SQL Databases NoSQL Databases
Schemas Structure and data types are Typically dynamic. Records can
fixed in advance. To store add new information on the fly,
information about a new data and unlike SQL table rows,
item, the entire database must be dissimilar data can be stored
altered, during which time the together as necessary. For some
database must be taken offline. databases (e.g., wide-column
stores), it is somewhat more
challenging to add new fields
dynamically.

Scaling SQL databases typically scale Horizontal Scalability: NoSQL


vertically, meaning they improve databases are built to scale out
performance by adding more by adding more servers to handle
resources (CPU, memory) to a increased load, rather than
single server rather than scaling up by adding more power
distributing the load across to a single server. This is known
multiple servers as horizontal scalability.
NoSQL vs. SQL Comparison
SQL Databases NoSQL Databases
Development Mix of open-source (e.g., Open-source
Model Postgres, MySQL) and closed
source (e.g., Oracle Database)
Supports Yes, updates can be configured In certain circumstances and at
Transactions to complete entirely or not at all certain levels (e.g., document
level vs. database level)
Data Specific language using Select, Through object-oriented APIs
Manipulation Insert, and Update statements,
e.g. SELECT fields FROM table
WHERE…
Consistency Can be configured for strong Depends on product. Some
consistency provide strong consistency (e.g.,
MongoDB) whereas others offer
eventual consistency (e.g.,
Cassandra)
Four types of NoSQL data model

• Key-value

• Document

• Column-family

• Graph
Relational database: order example
Key-Value Stores:

Structure: Stores data as key-value pairs.


Examples: Redis, DynamoDB, Riak.
Key-value

• Both a key and a value


are stored – in the case
of the order example,
everything to do with that
one order is simply stored
as one value
Key-value

• A key value store is primarily used when all access to


the database is via the primary key

• The Key/value model is the simplest and easiest to


implement. However, it is inefficient when you are only
interested in querying or updating part of a value.
Typical applications: Session storage, caching, user profiles,
preference management.
Data model: Collection of key-value pairs
Strengths: Simplicity - Very straightforward to use with a
simple data model (keys and values).
Weaknesses: Limited Query Capabilities - Lacks advanced
querying capabilities found in other data models
(e.g., complex joins, aggregations).
1. Session Storage
Example: Web Session Data

A key-value store like Redis can be used to


store session data, where the key is a
session ID and the value is a JSON object
containing session details.

Schema:

Key: session_id
Value: JSON object with session data
2. Caching
Example: Frequently Accessed Data

A key-value store can be used to cache


frequently accessed data, such as the
results of expensive database queries.

Schema:

Key: query_result_key
Value: Cached query result
3. User Profiles
Example: User Information Storage

A key-value store can be used to store user


profiles, where the key is a user ID and the
value is a JSON object containing user
information.

Schema:

Key: user_id
Value: JSON object with user profile data
Document

• Stores data as documents


(typically JSON or XML).
• Data is stored in documents
– in the order example,
everything to do with one
order is stored together in
one document

• Data entries are labelled


(e.g. customer id, quantity)
Examples: MongoDB,
CouchDB, RavenDB.
Document

• Documents are self-describing, hierarchical tree


structures which allow nested values associated with
each key.
• Document databases support querying more efficiently.

Typical applications: Content management systems, catalogs, user-


generated content.

Data model: Documents (with multiple values in a document)

Strengths: Flexible Schema - Allows for dynamic and flexible data


structures, making it easy to evolve the data model.
Weaknesses: Performance - Can suffer from performance issues
with very large documents or highly nested structures.
1. Content Management Systems (CMS)
Example: Article Storage

A document-based NoSQL database like MongoDB can be used to


store articles in a CMS, where each document represents an article
with various attributes.

Schema:

Collection: articles
Document Fields: title, author, content, tags, published_date
{
"_id": "article123",
"title": "How to Use MongoDB for CMS",
"author": "John Doe",
"content": "This article explains how to use MongoDB for content management
systems...",
"tags": ["MongoDB", "CMS", "NoSQL"],
"published_date": "2024-06-03T12:00:00Z"
}

{
"_id": "article456",
"title": "Best Practices for NoSQL Databases",
"author": "Jane Smith",
"content": "In this article, we discuss the best practices for using NoSQL databases...",
"tags": ["NoSQL", "Database", "Best Practices"],
"published_date": "2024-06-02T11:00:00Z"
}
2. Catalogs
Example: Product Catalog

A document-based NoSQL database can be used to store product catalogs, where each
document represents a product with various attributes.

Schema:

Collection: products
Document Fields: name, description, price, category, stock
{
"_id": "product123",
"name": "Laptop",
"description": "High-performance laptop with 16GB RAM and 512GB SSD.",
"price": 999.99,
"category": "Electronics",
"stock": 50
}

{
"_id": "product456",
"name": "Smartphone",
"description": "Latest model smartphone with advanced features.",
"price": 699.99,
"category": "Electronics",
"stock": 150
}
3. User-Generated Content
Example: Blog Posts

A document-based NoSQL database can be used to store user-generated blog posts,


where each document represents a blog post with various attributes.

Schema:

Collection: blog_posts
Document Fields: title, author, content, comments, posted_date
Sample data
{
"_id": "post123",
"title": "My First Blog Post",
"author": "user123",
"content": "This is the content of my first blog post...",
"comments": [
{"user": "user456", "comment": "Great post!", "date": "2024-06-03T13:00:00Z"},
{"user": "user789", "comment": "Thanks for sharing!", "date": "2024-06-03T14:00:00Z"}
],
"posted_date": "2024-06-03T12:00:00Z"
}
Column-family

• Stores data in columns


rather than rows. Data is
stored in column families.
• Data is stored with keys that
are linked to groups of
column (attributes) – for
example, a group or column
family that stores customer
details, another that stores
orders for that customer,
etc.

• Everything about one order


Column-family

• Column family stores allow you to store data with keys


mapped to values and the value grouped into multiple
column families, each column family being a map of
data.

Typical applications: Time-series data, data warehousing, logging.

Data model: Columns – column families


Strengths: Scalability - Highly scalable and can handle large
amounts of data distributed across many servers.
Weaknesses: Complex Data Model - More complex to design and
manage compared to key-value and document stores.
Exampels

• Apache Cassandra, HBase, ScyllaDB.


1. Time-Series Data
Example: Sensor Data Collection

A column-family NoSQL database like Apache Cassandra can be used to


store sensor data, where each row represents a different timestamp, and
each column within that row contains sensor readings.

Schema:

Row Key: sensor_id + timestamp


Columns: temperature, humidity, pressure
Row Key: sensor1_20230603_120000 Columns: temperature: 22.5 humidity: 55.3
pressure: 1012.8 Row Key: sensor1_20230603_121000 Columns: temperature: 22.6
humidity: 55.1 pressure: 1012.6
2. Data Warehousing Sample Data:
Example: Sales Data Storage

A column-family NoSQL database can be


used to store large amounts of sales data,
where each row represents a different
product, and each column represents sales
data from different regions or time
periods.

Schema:

Row Key: product_id


Columns: sales_q1, sales_q2, sales_q3,
sales_q4
3. Logging
Example: Application Log Storage

A column-family NoSQL database can be


used to store application logs, where each
row represents a different log entry and
each column provides details about the
log event.

Schema:

Row Key: log_id + timestamp


Columns: log_level, message, user_id
Graph databases

• Graph databases
organize data into node
and edge graphs; they
work best for data that
has complex relationship
structures
• Examples: Neo4j,
ArangoDB, Amazon
Neptune.
Graph
• Graph databases allow you to store entities
and relationships between these entities.
• Entities are also known as nodes, which have
properties. Think of a node as an instance of an object
in the application.
• Relations are known as edges that can also have
properties. Nodes are organised by relationships which
allow you to find interesting patterns between the nodes.
• The organisation of the graph lets the data be stored
just once and then interpreted in different ways based
on relationships.
Graph

Typical applications: Social networks, recommendation


engines, fraud detection.

Data model: Graph (nodes and edges)


Strengths: Relationships - Excellent for
handling complex relationships
between data entities.
Weaknesses: Scalability - Horizontal scalability
can be challenging compared to
other NoSQL models, especially
for very large datasets.
1. Social Networks
Example: User Relationships

A graph NoSQL database like Neo4j can


model social networks by representing
users as nodes and their relationships
(e.g., friendships) as edges.

Schema:

Nodes: User
Edges: FRIENDS_WITH
2. Recommendation Engines
Example: Product Recommendations

A graph NoSQL database can be used to


store and query product
recommendations, where users and
products are nodes, and relationships
indicate purchase history or interest.

Schema:

Nodes: User, Product


Edges: PURCHASED, INTERESTED_IN
3. Fraud Detection Nodes:
Example: Transaction Monitoring Transaction: {id: "trans001", amount:
500.00, timestamp: "2024-06-
A graph NoSQL database can be used to 03T12:00:00Z", location: "New York"}
detect fraud by modeling transactions as Transaction: {id: "trans002", amount:
nodes and relationships between them to 1500.00, timestamp: "2024-06-
identify patterns of suspicious behavior. 03T12:10:00Z", location: "London"}
User: {id: "user123", name: "John Doe"}
Schema:
Edges:
Nodes: Transaction, User (trans001)-[MADE_BY]->(user123)
Edges: MADE_BY, CONNECTED_TO (trans002)-[MADE_BY]->(user123)
(trans001)-[CONNECTED_TO]->(trans002)
Types of NoSQL databases
Reasons to use NoSQL
1. To improve programmer productivity by using a database
that better matches an application's needs.

• e.g. removing impedance mismatch by storing objects


together in aggregates rather than splitting them up
into relational tables

2. To improve data access performance via some


combination of handling larger data volumes,
reducing latency, and improving throughput.

• When a database is large enough to be split over


several database servers, NoSQL may be a
good option
Reasons to stick with Relational DBs

• They are well-known, therefore it is easier to find


people with experience of using them

• The technology is more mature and less likely to


encounter problems

• Many other tools are built on relational technology

"A DBA walks into a NOSQL


bar, but turns and leaves
because he couldn't find a
table"
Polyglot persistence

• Polyglot: the ability to speak multiple languages

• It is predicted that in the future, developers will make


use of a range of different technologies for the
persistence (storage) of data

• Relational databases and types of NoSQL databases


can be utilised as necessary to solve the particular
problems to which they are best suited

You might also like