0% found this document useful (0 votes)
4 views31 pages

2024-Lesson 8 SQL NoSQL Intro

The document outlines a course on managing relational and non-relational data, focusing on SQL Server and NoSQL databases. It details learning outcomes, evaluation criteria, and challenges, along with a historical overview of database technologies and the benefits of NoSQL. The course aims to equip students with the knowledge to effectively use and implement various database systems, particularly in Azure environments.

Uploaded by

oioioi1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views31 pages

2024-Lesson 8 SQL NoSQL Intro

The document outlines a course on managing relational and non-relational data, focusing on SQL Server and NoSQL databases. It details learning outcomes, evaluation criteria, and challenges, along with a historical overview of database technologies and the benefits of NoSQL. The course aims to equip students with the knowledge to effectively use and implement various database systems, particularly in Azure environments.

Uploaded by

oioioi1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Managing Relational and

Non-Relational Data
Last Review: 15th February 2024

NoSQL - Lesson 8
Why? What? When?

Post-Graduation in Enterprise Data Science & Analytics

Classified as Microsoft Confidential


Curricular Unit

Managing Relational Data


João Loureiro
E-mail: [email protected]

Managing Non-Relational Data


António Sérgio Azevedo
E-mail: [email protected]
André Ferreira
E-mail: [email protected]

Classified as Microsoft Confidential 2


Learning outcomes

Managing Relational Data (14h)

• Work with SQL Server and Azure databases

• Work with data using T-SQL (Transact-SQL) language

• Design and implement logical data model using SQL Server

Managing Non-Relational Data (14h)

• Learn when to use the different types of NoSQL databases

• Design and deploy NoSQL databases on Azure

Classified as Microsoft Confidential 2


Bibliography

• Ben-Gan, Itzik (2016), T-SQL Fundamentals, 3rd Edition. Microsoft press.

• Korotkevitch, Dmitri (2016), Pro SQL Server Internals. Apress.

• Sullivan, Dan (2015), NoSQL for Mere Mortals. Addison-Wesley.

• McCreary, Dan & Kelly, Ann (2013), Making Sense of NoSQL: A guide for

managers and the rest of us. Manning Publications.

Classified as Microsoft Confidential 2


Evaluation

In order to complete the course successfully, students must


obtain a final grade of at least 9,5 points (1st or 2nd call):

1st call
SQL Project (30%) – Date: 5th May
NoSQL Challenges (30%) – 19th May | 2nd June | 9th June
Groups up to four students
Exam (40%) – 3rd July
2nd call
Exam (100%) – 16th July
Or
SQL Project (30%) + NoSQL Challenges (30%) + Exam (40%)

Classified as Microsoft Confidential 5


Challenges – Total of Final Grade 30%

Challenge 1 (20%) – NoSQL Concepts and Key-Value DB


• Release: May 10
• Deadline: May 19

Challenge 2 (40%) – NoSQL Document DB


• Release: May 17
• Deadline: June 2

Challenge 3 (40%) – NoSQL Column DB


• Release: May 24
• Deadline: June 9

Classified as Microsoft Confidential 5


Challenge 0 (0.5 Extra Value), Not Mandatory and Individual

Get started with Azure Cosmos DB for NoSQL - Training | Microsoft Learn
• Release: April 22 (now available)
• Deadline: May 10

Classified as Microsoft Confidential


Azure for Students

• Activate your free Azure for Students:


https://azure.microsoft.com/en-us/free/students/

• Open ‘Microsoft Azure -> Subscriptions’ and confirm your subscription is active

• To check your remaining credit, visit


https://www.microsoftazuresponsorships.com/

Classified as Microsoft Confidential 6


Next Sessions Plan

• NoSQL: Why? What? When? (Lesson 8 & 9)


• Key-Value Databases (Lesson 10)
• Document Databases (Lesson 11 & 12)
• Column Family Databases (Lesson 13)
• Graph Databases (Lesson 14)

Classified as Microsoft Confidential


What is your Background?

What is your Background?

Classified as Microsoft Confidential


Agenda - NoSQL: Why? What? When?

• History of NoSQL
• NoSQL Vs RDBMS
• Benefits of NoSQL
• When NoSQL?
• Types of NoSQL Databases

Classified as Microsoft Confidential


Objective

One of the disadvantages of NoSQL is that decision making is more challenging. This
course is designed to lessen that challenge. After taking this course, you should understand
NoSQL options and when to use them.

NoSQL For Mere Mortals

Classified as Microsoft Confidential


A bit of history

2003: MarkLogic
2004: MapReduce
2005: Hadoop
1970 : Codd’s Paper
2005: Vertica
1951: Magnetic Tape 1974: System R
2007: Dynamo
1955: Magnetic Disk 1978: Oracle
2008: Cassandra
1961: ISAM 1980: Commercial Ingres
2008: Hbase
1965: Hierarchical model 1984: DB2
2008: NuoDB
1968: IMS 1987: Sybase
2009: MongoDB
1969: Network Model 1989: Postgres
2010: VoltDB
1971: IDMS 1989: SQL Server
2010: Hana
1995: MySQL
2011: Riak
2012: Areospike
2014: Splice Machine

1950 – 1972 1972 – 2005 2005 – 2015


Pre-Relational Relational The Next Generation

Classified as Microsoft Confidential


A bit of history – Flat File Data Management Systems

Flat Files – Developers would create a file and layout information in that file.
The image below represents a chunk of data read by a magnetic tape or disk
drive in a single read operation.

Classified as Microsoft Confidential


A bit of history – Flat File Data Management Systems

Random access to blocks on tape can take more time than sequential access
because there can be more tape movement relative to the amount of data
read.

Random access is more efficient on disk drives. Read-write heads of disk drives
may need to move to be in the correct position to read a data block, but there
is less movement than with tapes. Disk read-write heads only need to move at
most the radius of the disk. Tape drives may need to move the full length of a
tape to retrieve a data block.

Classified as Microsoft Confidential


A bit of history - Hierarchical Data Model Systems and
Network Data Management Systems

- The hierarchical model is organized - A simple network schema shows which


into a set of parent-child relations. entities can link to other entities
- Searching is more efficient compared - Child nodes can have multiple parents
to Flat Files (many-to-many relation).

- We may need to Duplicate data - One customer with one loan is


- The limitation of network databases is that they can be difficult
easily managed. One customer with three loans is easily
to design and maintain. Depending on how nodes are linked, a
managed. Two customers with one loan, such as two business
program may need to traverse many links to get to the node
partners taking out a short-term business loan, are not so easily
with the needed data
represented.
- You cannot have cycles in the graph.

Classified as Microsoft Confidential


A bit of history - Hierarchical Data Model Systems and
Network Data Management Systems

This graph has cycles and, therefore, is not a directed acyclic graph and not a
model of a network data management system.

Classified as Microsoft Confidential


A bit of history – Relational Database Management
Systems

• Although network and hierarchical data management systems improved on flat file data
management systems, it was not until 1970 when E. F. Codd published a paper on the
design of a new type of database that data management technology radically changed.

• Relational databases separated the logical organization of data structures from the
physical storage of those structures. Codd and others developed rules for designing
relational databases that eliminated the potential for some types of data anomalies, such
as inconsistent data.

• A Structured Query Language (SQL) was a great advantage compared to Flat Filles

Classified as Microsoft Confidential


A bit of history

What occurred around the year 2000 that made


clear that RDBMs had some limitations?

Classified as Microsoft Confidential


Benefits of Relational Databases

Design for all purposes

ACID (Atomicity, Consistency,


Isolation, Durability)

Strong consistency, concurrency,


recovery
Terms Definition
Standard Query language (SQL) Database Organized collection of data
DMBS – Database Software package with computer
Management programs that controls the creation,
Lot of tools to use with / Community System maintenance and user of database
Tools Overview

• Ex: Reporting services Databases are created to operate large quantities of


• Ex: SQL Tools and Utilities for SQL Server, Azure information by inputting, storing, retrieving, and managing
SQL Database, and Azure SQL Data Warehouse this information
• Tools Overview

Classified as Microsoft Confidential


NoSQL - Motivation

Whiteboard exercise – NoSQL Motivation

Classified as Microsoft Confidential


Demo - Azure Cosmos DB Portal

Demo – Azure Cosmos DB Portal

Classified as Microsoft Confidential


Why NoSQL?

Why is NoSQL so Popular?

Classified as Microsoft Confidential


Definition

“NoSQL is a set of concepts that allows the rapid and efficient processing of
data sets with a focus of performance, reliability and agility”
McCreary, Dan & Kelly, Ann (2013). Making Sense of NoSQL.

Classified as Microsoft Confidential


Benefits of NoSQL Databases

Elastic Scaling
• RDBMS scale up – bigger load and server
• NoSQL scale out – distribute data across multiple hosts (Ex: Black Friday,
Christmas, etc. – increase in the order of magnitudes)
Big Data
• Huge increase in data
• NoSQL designed for big data (High Volume, Velocity and Variety of data)
Flexible Models
•RDBMS Schema change management
•NoSQL DB more relaxed in structure of data (Schema Less)
•In RDBMS:
•You can’t add a record which does not fit the schema
•Need to have a value even for unused items in a row (ex: NULL)
•Datatype must be considered (can’t add a string to an integer) Elastic Scaling
•Cannot add multiple items in a field
•In NoSQL we can gather all item in an Aggregate (ex: Document)
DBA Specialists Big Data
Economics
• Clusters of cheap commodity servers to manager data and transaction Benefits of
volume NoSQL
• Cost per gigabyte or transaction/second for NoSQL can be lower than the
cost for a RDBMS Flexible data
Economics Models
DBA Specialists
• Experts required to monitor RDBMS
• NoSQL requires less management, automatic repair and simpler data models

Classified as Microsoft Confidential


NoSQL – Value of Scale-out Architecture

Scale-out = Adding Servers (Horizontal Scaling)


Scale-down = Removing Servers (Horizontal Scaling)

Scale-up = Adding CPU and Memory (Vertical Scaling)

*We don’t use the term scale down for Vertical Scaling

Classified as Microsoft Confidential


How does NoSQL vary from RDBMS?

Schema less
What is NoSQL What NoSQL is not
Applications written to deal with
specific documents (JSON) More than rows in Not about SQL language
tables

Designed to handle distributed, large Free of joins Not only about open source
databases Schemaless Not only big data
Works on many Not about cloud computing
Trade offs: processors
• NoSQL not designed for ad hoc query Uses shared-nothing Not about clever use of RAM and
of data commodities SSD
• Designed for speed and growth of computers
database Innovative Not an elite group of products
• Relaxation of the ACID properties => you only need to convince
others you have innovative
• Atomicity
solutions to their business
• Consistency problems.
• Isolation
• Durability

Classified as Microsoft Confidential


Normalization and Denormalization
• Database normalization is the process of organizing data into tables in such a way as to reduce the potential for data
anomalies. An anomaly is an inconsistency in the data.
• Denormalization, it introduces redundant data. You might wonder, why introduce redundant data? It can cause data
anomalies. It obviously requires more storage to keep redundant copies of data. The reason to risk data anomalies and use
additional storage is that denormalization can significantly improve performance.
• When data is denormalized, there is no need to read data from multiple tables and perform Joins. Instead, data is
retrieved from a single document. This can be much faster than retrieving data from multiple tables.
• Designing databases entails trade-offs. You could design a highly normalized database with no redundant data but suffer poor
performance. When that happens, many designers turn to denormalization.

Classified as Microsoft Confidential


Normalization and Denormalization

Classified as Microsoft Confidential


Summary

• NoSQL next Sessions Plan and Learn Objective


• History of Databases and why did we come to NoSQL
• How NoSQL compares with RDBMS
• NoSQL motivation and advantages
• Learned how to create an Azure Cosmos DB

Classified as Microsoft Confidential


Obrigado!

Morada: Campus de Campolide, 1070-312 Lisboa, Portugal


Tel: +351 213 828 610 | Fax: +351 213 828 611

Classified as Microsoft Confidential

You might also like