W01- Database Systems_Introduction
W01- Database Systems_Introduction
CSC104
Nazeef Ul Haq
Spring 2025
Problem Based Learning
• Lets start with a problem…
Example of a Traditional Database
Application
Suppose we are building a system to store the information about:
• students
• courses
• professors
• who takes what, who teaches what
4
Can we do it with the usage of files?
Sure we can! Start by storing the data in files:
5
Doing it with a file system
• Enroll “Ali” in “CS262”:
Read
Read‘students.txt’
‘students.txt’
Read
Read‘courses.txt’
‘courses.txt’
Find&update
Find&updatethetherecord
record“Ali”
“Ali”
Find&update
Find&updatethetherecord
record“CS262”
“CS262”
Write
Write“students.txt”
“students.txt”
Write
Write“courses.txt”
“courses.txt”
6
Problems with file system...
• System crashes:
Read
Read‘students.txt’
‘students.txt’
Read
Read‘courses.txt’
‘courses.txt’
Find&update
Find&updatethetherecord
record“Ali”
“Ali”
CRASH !
Find&update the record “CS262”
Find&update the record “CS262”
What is the problem ? Write
Write“students.txt”
“students.txt”
Write
Write“courses.txt”
“courses.txt”
• Large data sets (say 50GB)
• What is the problem ?
• Simultaneous access by many users?
• What happens when you mistakenly add Ali two times? How will you
manage duplicate insertions?
• Attach a course to a student which does not exist. What will happen then?
• If location of the item is unknown, what to do?
• Read, write, open and close operations etc.
• Security?
7
Problems
• Data inconsistency/anomalies
• Data redundancy
• Durability
• Concurrency
• File size etc.
A program that
makes it easy for
you to
manipulate large
amounts of data.
90’s
Object Bases
• Class Person{
public:
Person();
~Person();
float GetSalary();
float PutSalary(float&);
string Name;
int SSN;
date BirthDate;
private:
float salary;
}
Types of Data
• Structured
• Relational
• Semi-structured
• CSV
• JSON
XML etc.
• Unstructured
• Emails
• Documents
• PDFs etc.
• Binary Data
• Images
• Videos etc.
Flash Names
• Database/DB
• Database Management System (DBMS)
• Data scientist
• Data Anomalies
• Data Inconsistency
• Data Redundancy
Till now, we have
used files to store
Business Application Architecture data
Application Layer
(Web, Business Logic Data Access Layer Data
Desktop,Console)
Lets dive into some
examples
UnPack YouTube Database
Result list
Rea
Lorem
Video & description,
ipsum
d # Views, Likes
Example
Unpack Lorem
Youtu
ipsum
Youtube DB be DB
congue
Lorem
Modify
Lorem
Learn
ipsum ipsum
Self Driving
Front panel metrics
Rea
Lorem Speed, distance
ipsum
d ETA
Example
Unpack
ATM DB:
Transaction
Read Balance Read Balance
Give money vs Update Balance
Update Balance Give money
Transfer $3k from a10 to a20:
Transactions
Example
Scenarios
1. Crash before 1?
2. After 1 but before 2? [Bad!! a10: 17,000, a20:
15,000]
3. After 2?
DBs are often optimized for key use cases
L
L L
o
o o
r
r r
e
e e
m
m m
Goals of L L L L
L L
o o o o
o o i
i i r r
r r r r p
p p e e
e e e e s
s s m DB m
m DB m m DB m u
u u
Special i
p
m
i
p
i
p
m i
p
i
p
m
i
p
Databases s s s s
s s
u u u u
u u
m m m m
m m
Store current data Optimize historical Run batch
(e.g., lot of reads) data (e.g., logs) Workloads
(e.g. training)
For
What?
How?
How?
Example
Mobile Game
Report &
Share
Business/Produc
Real-Time
Game App User Events
DB
DBMS t Analysis
DB v0 Q1: 1000 users/sec? Q7: How to model/evolve game Q4: Which user
Q2: Offline? data? cohorts?
Q3: Support v1, v1’ Q8: How to scale to millions of Q5: Next features to
versions? users? build? Experiments to
Q9: When machines die, restore run?
game state gracefully? Q6: Predict ads
demand?
2 3
Data system
“v1” on Cloud
Example
Game App Mobile Game Data
Exploration Report &
Cloud Datalab Share
DB Business/Produc
4 t Analysis
Data Sync Data
Analytics
Processing
Engine
1 MySQL,
BigQuery
Dataflow
2 3
Data system 0 Real-Time
User Events
“v2” Cloud +
Local Local DB
Data systems
Data warehouse:
Repository of processed or structured data.
Data Lake - Set of Data
Systems for different data
(e.g., Netflix has HD movies
(1GB?) and user logs)