0% found this document useful (0 votes)
2 views

W01- Database Systems_Introduction

The document discusses the limitations of traditional file systems for managing large datasets and introduces Database Management Systems (DBMS) as a solution. It covers key components of DBMS, including query processing, transaction management, and the importance of ACID properties. Additionally, it highlights the evolution of data storage models and the distinction between data lakes and data warehouses.

Uploaded by

yasirasad341
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

W01- Database Systems_Introduction

The document discusses the limitations of traditional file systems for managing large datasets and introduces Database Management Systems (DBMS) as a solution. It covers key components of DBMS, including query processing, transaction management, and the importance of ACID properties. Additionally, it highlights the evolution of data storage models and the distinction between data lakes and data warehouses.

Uploaded by

yasirasad341
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Database Systems

CSC104
Nazeef Ul Haq
Spring 2025
Problem Based Learning
• Lets start with a problem…
Example of a Traditional Database
Application
Suppose we are building a system to store the information about:
• students
• courses
• professors
• who takes what, who teaches what

4
Can we do it with the usage of files?
Sure we can! Start by storing the data in files:

students.txt courses.txt professors.txt

Now write C or Java programs to implement specific tasks

5
Doing it with a file system
• Enroll “Ali” in “CS262”:

Write a C program to do the following:

Read
Read‘students.txt’
‘students.txt’
Read
Read‘courses.txt’
‘courses.txt’
Find&update
Find&updatethetherecord
record“Ali”
“Ali”
Find&update
Find&updatethetherecord
record“CS262”
“CS262”
Write
Write“students.txt”
“students.txt”
Write
Write“courses.txt”
“courses.txt”

6
Problems with file system...
• System crashes:
Read
Read‘students.txt’
‘students.txt’
Read
Read‘courses.txt’
‘courses.txt’
Find&update
Find&updatethetherecord
record“Ali”
“Ali”
CRASH !
Find&update the record “CS262”
Find&update the record “CS262”
What is the problem ? Write
Write“students.txt”
“students.txt”
Write
Write“courses.txt”
“courses.txt”
• Large data sets (say 50GB)
• What is the problem ?
• Simultaneous access by many users?
• What happens when you mistakenly add Ali two times? How will you
manage duplicate insertions?
• Attach a course to a student which does not exist. What will happen then?
• If location of the item is unknown, what to do?
• Read, write, open and close operations etc.
• Security?
7
Problems
• Data inconsistency/anomalies
• Data redundancy
• Durability
• Concurrency
• File size etc.
A program that
makes it easy for
you to
manipulate large
amounts of data.

Opening Door of DBMS


DBMS
• Create databases
• Query
• Storage
• Durability
• Concurrency

Examples? Oracle, mySQL, postgreSQL, MS SQL Server, and Microsoft


Access etc.
DBMS
• Single Box – Components
• Double Boxes – In-memory data structures
• Solid line – Control and Data flow
• Dashed line – Data flow

• Users: Ask for data or modify data (DML and DDL)


• DB Administrator or DBA: Responsible for the structure or schema of
the database
DBMS continued..
• Query Processing
• Query
• Query Compiler (Parse and optimize query)
• Execution Engine (Executes query)
• Resource Manager (Manages locations of files and uses indexes to execute the query or find elements)
• Buffer Manager (brings data in RAM)
• Storage Manager (Gets data from disk)
• Transaction Manager (ACID)
• Atomic
• Isolation
• 1 query = 1 transaction
• 1 unit = 1 transaction = 1 or more queries
• Two parts: Concurrency Control Manager (maintains A and I of ACID) and Logging and Recovery
Manager (maintains D of ACID)
Storage and Buffer Manager
• Storage Manager
• Controls placement of data on disk and its movement between disk and main
memory
• Tracks location of files
• Obtains blocks of file on request
• Buffer Manager
• Partitions available space of main memory in buffers where disk blocks can
reside
• DBMS components will interact with buffers for information (data, metadata,
logs, stats and indexes)
Transaction Processing
• Logging
• Recovery manage will check logs and restore in case of crash.
• Communicates with buffer manager and saves logs on disk
• Concurrency
• Concurrency Control manager must assure that the individual actions of
multiple transactions are executed in such an order that the net effect is the
same as if the transactions had in fact executed in their entirety, one-at a
time.
• Locks – stored in main memory – lock table
• Deadlock Resolution
• Can stuck in locks while fighting for resources
Query Processor
• Effects performance of DBMS
• Query Compiler
• Generates query plan
• Parser (builds tree)
• Preprocessor (Relation checking and generating query plan)
• Optimizer (best solution)
• Execution Engine
• Executes the query plan as it is!!!
• Interacts with buffers
• Interacts with schedulers to avoid accessing locked data
• Interacts with log manager to log data
Expected from databases
• Speed
• Scale
• Stability
• Evolution
• Reliability
• Cost Effectiveness
ACID properties
• Atomicity (Work as a unit. Yes or No – Nothing else)
• Consistency
• Isolation (No interaction among users)
• Durability
Comparison with File System?
• Can file system provide all the tasks required by DBMS?
Collection of
Files
60’s

Hierarchical (tree-based) Network (graph)


70's

Choice for most new


80's Relational applications

90’s

Object Bases

now Knowledge Bases


Relational Model Attributes

Name Street City gpa


Student-id
• Example of tabular data in the relational model
192-83-7465 Johnson
Alma Palo Alto 3.6
019-28-3746 Smith
North Rye 2.7
192-83-7465 Johnson
Alma Palo Alto 3.2
321-12-3123 Jones
Main Harrison 4.0
019-28-3746 Smith
North Rye 3.45
Object Oriented Model

• Class Person{
public:
Person();
~Person();
float GetSalary();
float PutSalary(float&);
string Name;
int SSN;
date BirthDate;
private:
float salary;
}
Types of Data
• Structured
• Relational
• Semi-structured
• CSV
• JSON
XML etc.
• Unstructured
• Emails
• Documents
• PDFs etc.
• Binary Data
• Images
• Videos etc.
Flash Names
• Database/DB
• Database Management System (DBMS)
• Data scientist
• Data Anomalies
• Data Inconsistency
• Data Redundancy
Till now, we have
used files to store
Business Application Architecture data

Application Layer
(Web, Business Logic Data Access Layer Data
Desktop,Console)
Lets dive into some
examples
UnPack YouTube Database
Result list
Rea
Lorem
Video & description,
ipsum
d # Views, Likes

Example

Unpack Lorem
Youtu
ipsum
Youtube DB be DB
congue
Lorem
Modify
Lorem
Learn
ipsum ipsum

Related videos Upload


Relevant ads Like, Review
Every minute
on the
Internet
as per 2021
Example

Self Driving
Front panel metrics
Rea
Lorem Speed, distance
ipsum
d ETA
Example

Unpack Cars Lorem


Cars
ipsum
DB DB
congue
Lorem
Modify
Lorem
Learn
ipsum ipsum

Collisions Road models


Traffic signals Drive models
Example

Unpack
ATM DB:

Transaction
Read Balance Read Balance
Give money vs Update Balance
Update Balance Give money
Transfer $3k from a10 to a20:

1 Debit $3k from a10


2 Credit $3k to a20

Transactions

Example

Scenarios
1. Crash before 1?
2. After 1 but before 2? [Bad!! a10: 17,000, a20:
15,000]
3. After 2?
DBs are often optimized for key use cases
L
L L
o
o o
r
r r
e
e e
m
m m
Goals of L L L L
L L
o o o o
o o i
i i r r
r r r r p
p p e e
e e e e s
s s m DB m
m DB m m DB m u
u u

Special i
p
m
i
p
i
p
m i
p
i
p
m
i
p

Databases s s s s
s s
u u u u
u u
m m m m
m m
Store current data Optimize historical Run batch
(e.g., lot of reads) data (e.g., logs) Workloads
(e.g. training)

> 100 viable data engines on market


(MySql, Postgres, Oracle, IBM/SAP to
data clouds on AWS/Azure, GCP, to Spark, Cockroach/Spanner, Mongo, ….)
For
Whom?

For
What?

How?
How?

Example
Mobile Game
Report &
Share
Business/Produc
Real-Time
Game App User Events
DB
DBMS t Analysis

DB v0 Q1: 1000 users/sec? Q7: How to model/evolve game Q4: Which user
Q2: Offline? data? cohorts?
Q3: Support v1, v1’ Q8: How to scale to millions of Q5: Next features to
versions? users? build? Experiments to
Q9: When machines die, restore run?
game state gracefully? Q6: Predict ads
demand?

App designer Systems Product/Biz


designer designer
How?

Example Mobile Game Data


Exploration
Cloud Datalab
Report &
Share

Game App DB Business/Produc


Real-Time 4 t Analysis
User Events Data Analytics
1 Processing Engine
E-T-L, Dataflow BigQuery

2 3

Data system
“v1” on Cloud

1 Log user actions 3 Run queries in a


peta scale analytics
2 Store in DB, after system
Extract-Transform-
Load 4 Visualize query
How?

Example
Game App Mobile Game Data
Exploration Report &
Cloud Datalab Share
DB Business/Produc
4 t Analysis
Data Sync Data
Analytics
Processing
Engine
1 MySQL,
BigQuery
Dataflow
2 3
Data system 0 Real-Time
User Events

“v2” Cloud +
Local Local DB

0 Log user actions 3 Run queries in a


In local DB petabyte scale analytics
system
1 Data sync to cloud
4 Visualize query results
2 Store in DB, after
ETL
DBs - General + Optimized

Summary Data System - Connect DBs


to solve a problem
Data bases

Data systems

Data lakes Data Lake - Set of Data


Systems for different data
(e.g., Netflix has HD movies
(1GB?) and user logs)
Data Lake and Warehouse
Data Lake:
Repository of structured, unstructured, semi-structured or binary
data.

Data warehouse:
Repository of processed or structured data.
Data Lake - Set of Data
Systems for different data
(e.g., Netflix has HD movies
(1GB?) and user logs)

You might also like