2024-Lesson 8 SQL NoSQL Intro
2024-Lesson 8 SQL NoSQL Intro
Non-Relational Data
Last Review: 15th February 2024
NoSQL - Lesson 8
Why? What? When?
• McCreary, Dan & Kelly, Ann (2013), Making Sense of NoSQL: A guide for
1st call
SQL Project (30%) – Date: 5th May
NoSQL Challenges (30%) – 19th May | 2nd June | 9th June
Groups up to four students
Exam (40%) – 3rd July
2nd call
Exam (100%) – 16th July
Or
SQL Project (30%) + NoSQL Challenges (30%) + Exam (40%)
Get started with Azure Cosmos DB for NoSQL - Training | Microsoft Learn
• Release: April 22 (now available)
• Deadline: May 10
• Open ‘Microsoft Azure -> Subscriptions’ and confirm your subscription is active
• History of NoSQL
• NoSQL Vs RDBMS
• Benefits of NoSQL
• When NoSQL?
• Types of NoSQL Databases
One of the disadvantages of NoSQL is that decision making is more challenging. This
course is designed to lessen that challenge. After taking this course, you should understand
NoSQL options and when to use them.
2003: MarkLogic
2004: MapReduce
2005: Hadoop
1970 : Codd’s Paper
2005: Vertica
1951: Magnetic Tape 1974: System R
2007: Dynamo
1955: Magnetic Disk 1978: Oracle
2008: Cassandra
1961: ISAM 1980: Commercial Ingres
2008: Hbase
1965: Hierarchical model 1984: DB2
2008: NuoDB
1968: IMS 1987: Sybase
2009: MongoDB
1969: Network Model 1989: Postgres
2010: VoltDB
1971: IDMS 1989: SQL Server
2010: Hana
1995: MySQL
2011: Riak
2012: Areospike
2014: Splice Machine
Flat Files – Developers would create a file and layout information in that file.
The image below represents a chunk of data read by a magnetic tape or disk
drive in a single read operation.
Random access to blocks on tape can take more time than sequential access
because there can be more tape movement relative to the amount of data
read.
Random access is more efficient on disk drives. Read-write heads of disk drives
may need to move to be in the correct position to read a data block, but there
is less movement than with tapes. Disk read-write heads only need to move at
most the radius of the disk. Tape drives may need to move the full length of a
tape to retrieve a data block.
This graph has cycles and, therefore, is not a directed acyclic graph and not a
model of a network data management system.
• Although network and hierarchical data management systems improved on flat file data
management systems, it was not until 1970 when E. F. Codd published a paper on the
design of a new type of database that data management technology radically changed.
• Relational databases separated the logical organization of data structures from the
physical storage of those structures. Codd and others developed rules for designing
relational databases that eliminated the potential for some types of data anomalies, such
as inconsistent data.
• A Structured Query Language (SQL) was a great advantage compared to Flat Filles
“NoSQL is a set of concepts that allows the rapid and efficient processing of
data sets with a focus of performance, reliability and agility”
McCreary, Dan & Kelly, Ann (2013). Making Sense of NoSQL.
Elastic Scaling
• RDBMS scale up – bigger load and server
• NoSQL scale out – distribute data across multiple hosts (Ex: Black Friday,
Christmas, etc. – increase in the order of magnitudes)
Big Data
• Huge increase in data
• NoSQL designed for big data (High Volume, Velocity and Variety of data)
Flexible Models
•RDBMS Schema change management
•NoSQL DB more relaxed in structure of data (Schema Less)
•In RDBMS:
•You can’t add a record which does not fit the schema
•Need to have a value even for unused items in a row (ex: NULL)
•Datatype must be considered (can’t add a string to an integer) Elastic Scaling
•Cannot add multiple items in a field
•In NoSQL we can gather all item in an Aggregate (ex: Document)
DBA Specialists Big Data
Economics
• Clusters of cheap commodity servers to manager data and transaction Benefits of
volume NoSQL
• Cost per gigabyte or transaction/second for NoSQL can be lower than the
cost for a RDBMS Flexible data
Economics Models
DBA Specialists
• Experts required to monitor RDBMS
• NoSQL requires less management, automatic repair and simpler data models
*We don’t use the term scale down for Vertical Scaling
Schema less
What is NoSQL What NoSQL is not
Applications written to deal with
specific documents (JSON) More than rows in Not about SQL language
tables
Designed to handle distributed, large Free of joins Not only about open source
databases Schemaless Not only big data
Works on many Not about cloud computing
Trade offs: processors
• NoSQL not designed for ad hoc query Uses shared-nothing Not about clever use of RAM and
of data commodities SSD
• Designed for speed and growth of computers
database Innovative Not an elite group of products
• Relaxation of the ACID properties => you only need to convince
others you have innovative
• Atomicity
solutions to their business
• Consistency problems.
• Isolation
• Durability