0% found this document useful (0 votes)
37 views

Lecture 3 - Introduction To NoSQL - Updated

The document provides an introduction to NoSQL databases, describing their benefits over relational databases for modern data needs. It covers key aspects of NoSQL databases including their schema-less data model, distributed architecture, and BASE properties. It also describes different types of NoSQL databases like key-value stores, document databases, graph databases and column-oriented databases.

Uploaded by

Bilal Ayub
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Lecture 3 - Introduction To NoSQL - Updated

The document provides an introduction to NoSQL databases, describing their benefits over relational databases for modern data needs. It covers key aspects of NoSQL databases including their schema-less data model, distributed architecture, and BASE properties. It also describes different types of NoSQL databases like key-value stores, document databases, graph databases and column-oriented databases.

Uploaded by

Bilal Ayub
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Introduction to NoSQL

Databases
CS 537- Big Data Analytics
Dr. Faisal Kamiran
March 22, 2021
RELATIONAL DATABASES
Benefits of Relational databases:
• Designed for all purposes
• ACID
• Strong consistency, concurrency, recovery
• Mathematical background
• Standard Query language (SQL)
• Lots of tools to use with i.e: Reporting services, entity frameworks

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


RELATIONAL DATABASES

In general, RDBMS systems have been considered as the one-size-


fits-all data retrieval and persistence solution for decades

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


RELATIONAL DATABASES - CHALLENGES

• Dramatic decrease in storage costs -


Exponential rise in data applications
• Variations in Data
• Continuously evolving schema with
change in requirements
• Single point of failure

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


RELATIONAL DATABASES - CHALLENGES
Relational databases were not
built for distributed applications.
Because...
• Joins are expensive
• Hard to scale horizontally
• Expensive (product cost, hardware,
maintenance)

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


RELATIONAL DATABASES - CHALLENGES
Relational databases were not
built for distributed applications.
Because...
• Joins are expensive
• Hard to scale horizontally
• Expensive (product cost, hardware,
maintenance)

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


MODERN DATA REQUIREMENTS

• Explosion of social media sites (Facebook, Twitter) with large data needs

• Rise of cloud-based solutions such as Amazon S3 (simple storage solution)

• Constantly changing requirements

• High-Velocity Data requiring fast query processing

• Increasingly sparse and semi-structured data

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


MODERN DATA REQUIREMENTS

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


NOSQL DATABASES

NoSQL stands for:


• No Relational
• No RDBMS
• Not Only SQL
• Allows SQL-like query languages to be used.

NoSQL is an umbrella term for all databases and data stores that do not follow
the RDBMS principles

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


NOSQL DATABASES

“Next Generation Database Management Systems mostly


addressing some of the points: being non-relational, distributed,
open-source and horizontally scalable.”
-- Nosql-database.org
The primary objective of a NoSQL Database is to have:
• Simplicity of design
• Horizontal scaling
• Finer control over availability

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


CHARACTERISTICS OF NOSQL DATABASES
THEY AVOID THEY ALLOW
Overhead of ACID transactions Easy and frequent changes to DB

Complexity of SQL query Fast development

Burden of up-front schema design Large data volume

DBA presence Schema less

Transactions (It should be handled at Distributed


application layer)

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


SCHEMA BASED DATA MODELLING

• In RDBMS, a schema describes every


functional element, including tables
and rows

• Exerts a high degree of control and


prevents capture of low-quality data

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


SCHEMA BASED DATA MODELLING
Problems
• Cannot add a record which does not fit
the schema

• Need to add NULLs to unused items in a


row

• Need to consider the datatypes - cannot


add a string to an integer field

• Cannot add multiple items in a field

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


SCHEMALESS DATA MODEL

In NoSQL Databases:
• There is no schema to consider – No need to conform to a rigid schema
• There is no unused cell
• There are no datatype enforcements on columns
• Most of the considerations are done in the application layer
• Data can be rapidly transformed as requirements change
• Facilitates the storage of unstructured data as well as structured data

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


SCHEMALESS DATA MODEL

Information stored in JSON-style documents which can have varying


sets of fields with different data types for each field
{
"name":"Joe",
"age":30,
"interests":"football"
}
{
"name":"Kate",
"age":25
}

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


SCHEMALESS DATA MODEL - ADVANTAGES

• No pre-defined database schemas


• No data truncation
• Suitable for real-time analytics
• On demand scalability to meet extreme Volumes, Velocity and Variety of
data

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


DISTRIBUTED ARCHITECTURE

• Multiple NoSQL databases can be created in a


distributed fashion
• Data is physically stored across different sites
• Reaches Eventual consistency
• Offers auto-scaling and fail-over capabilities

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


BASE (NOT ACID)

Recall ACID for RDBMS desired properties of transactions


• Atomicity, Consistency, Isolation, and Durability

NoSQL systems provide BASE and do not provide ACID

• Basically Available
• Soft state
• Eventually consistent

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


ACID VS. BASE

The idea is that by giving up ACID constraints, one can achieve much
higher performance and scalability
The systems differ in how much they give up
◦ e.g. most of the systems call themselves “eventually consistent”, meaning
that updates are eventually propagated to all nodes
◦ but many of them provide mechanisms for some degree of consistency, such
as multi-version concurrency control (MVCC)

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


CAP THEOREM
Often Eric Brewer’s CAP theorem
cited for NoSQL
A system can have only two out of
three of the following properties:
◦ Consistency – Data is same across all
sites even after updates and deletions
◦ Availability – Data is always
immediately available
◦ Partition-Tolerance – System continues
to work even in the presence of a
partial network failure

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


TYPES OF NOSQL DATABASES

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


KEY-VALUE STORE

Simplest NoSQL Database


Data is stored in the form of a key-
value hash table
◦ Each key is unique
◦ Its corresponding value can be any
data type (string, JSON, Blob e.t.c.)
◦ Values can also contain nested key-
value pairs

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


DOCUMENT-ORIENTED

Subclass of key-value store


Assumes a certain internal document structure in the data
A query language provides the ability to perform queries based on
this internal structure

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


GRAPH BASED

• Used to store fine-grained


networks of inter-connected data
• Stores entities as well the
relations amongst the entities
• Entity is stored as a node with
the relationship as edges
• Traversing persisted
relationships are faster

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


COLUMN BASED

• Based on the BigTable paper by


Google
• Variable-width tables
• Rows do not need to have the same
columns
• Columns can be added to any row
without having to add them to other
rows

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


WHEN TO USE A NOSQL DATABASE?

• Large amounts of data • Need to be able to store different


• Terabytes and Petabytes of data data type formats
• Need horizontal scalability • Users are distributed – low latency

• Need high throughput – fast • Need redundancy in case of failures


reads
• Need a flexible schema
• No fixed number of columns

• Need high availability

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


WHEN NOT TO USE A NOSQL DATABASE?

• Need ACID Transactions


• Need ability to do JOINS
• Ability to do aggregations and analytics
• Have changing business requirements
• Queries are not available and need to have flexibility
• Have a small dataset

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


APACHE CASSANDRA

“Apache Cassandra is a free and open-source, distributed, wide-


column store, NoSQL database management system designed to
handle large amounts of data across many commodity servers,
providing high availability with no single point of failure.”

Apache Cassandra uses its own query language CQL

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


FEATURES OF CASSANDRA

• Elastic Scalability

• Always on architecture

• Fast linear scale performance

• Flexible data storage

• Fast writes

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


COMPANIES USING CASSANDRA
Netflix uses
Apache Cassandra Uber uses
to serve all their Apache
videos to Cassandra for
customers. their entire
backend.

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


BASICS OF CASSANDRA

Keyspace
◦ Collection of Tables

Table
◦ A group of partitions

Rows
◦ A single item

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


THE BASICS OF APACHE CASSANDRA
Partition
◦ Fundamental unit of access
◦ Collection of row(s)
◦ How data is distributed

Primary Key
◦ Primary key is made up of a partition
key and clustering columns
Columns
◦ Clustering and Data
◦ Labeled element

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


Summary

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY


DEMO

DR. FAISAL KAMIRAN INFORMATION TECHNOLOGY UNIVERSITY

You might also like