Week1 Lecture
Week1 Lecture
Introduction to Databases
3
Logistics
4
Syllabus Overview
5
Pre-requisites
• Programming 1
• Algorithms 2
6
What you will achieve
7
References
• Textbooks
• [RG] Ramakrishnan R. and Gehrke J. Database Management
Systems, 3rd edition. McGraw-Hill Science/Engineering/Math, 2002.
• DMS solution manual for odd-numbered exercices
• [PW] Database Systems with SQL. ZyBooks. John Wiley & Sons, Inc. 2022.
(Available through Canvas).
CEO
Montreal, Rabat PhD
Universite Mohammed V
Engineer BSc
IBM Toronto Lab Al Akhawayn University
Microsoft Research
PhD Fellowship
10
Our Work
• Topics:
• Data Engineering: data cleaning, data discovery, data integration
• Large scale data processing and analytics
• IoT/Time Series data management
• Deep network embeddings
11
Research Directions
Scalable and accurate analytics
Efficient similarity search on massive collections of high-dimensional vectors
and thus, efficient high-d vector analytics (eg, classification)
13
Learning Outcomes
14
Why Are Databases Important?
15
Why Are Databases Important?
20
Why Are Databases Important?
21
Why Are Databases Important?
22
Why Are Databases Important?
23
Why Are Databases Important?
24
Why Are Databases Important?
25
Why Are Databases Important?
26
The Data Pipeline
Business
Application
User requirements
Problem Definition
27
The Data Pipeline
Business
Application
28
The Data Pipeline
Business
Application
29
The Data Pipeline
Business
Application
Feature Engineering
Statistical/ML Models
30
The Data Pipeline
Business
Application
Visualization, Presentation
Knowledge, Insights
31
The Data Pipeline
Business
Application
32
Key Terminology
• Data
• Data is numeric, textual, visual, or audio information that describes real-world
systems.
• Analog
• Historically, data was mostly analog, encoded as continuous variations on
various physical media.
• Digital
• Today, data is mostly digital, encoded as zeros and ones on electronic and
magnetic media.
33
Key Terminology
• Database:
• A database is a collection of data in a structured format.
• Database system / Database management system / DBMS
• A database system, also known as a database management system or DBMS, is
software that reads and writes data in a database. Database systems ensure data is
secure, internally consistent, and available at all times.
• Query Language
• A query language is a specialized programming language, designed specifically for
database systems.
• Database Application
• A database application is software that helps business users interact with database
systems.
34
Key Database Roles
35
Key Database Roles
36
Key Database Roles
37
Key Database Roles
38
Other Data Related Careers
• Data Engineers
• Data Scientists
• Data Analysts
• Data Architects
• Chief Data Officer
Source: https://data-flair.training/blogs/data-
scientist-vs-data-engineer-vs-data-analyst/
39
Other Data Related Careers
• Data Engineers
• Data Analysts
• Data Scientists
• Chief Data Officer
Source: https://data-flair.training/blogs/data-
scientist-vs-data-engineer-vs-data-analyst/
40
Other Data Related Careers
• Data Engineers
• Data Analysts
• Data Scientists
• Data Architects
• Chief Data Officer
Source: https://data-flair.training/blogs/data-
scientist-vs-data-engineer-vs-data-analyst/
41
Other Data Related Careers
• Data Engineers
• Data Analysts
• Data Scientists
• Data Architects
• Chief Data Officer
42
Conclusion
43
Databases
Overview of Database Management Systems
44
Why are Database Management Systems Important?
45
Why are Database Management Systems Important?
46
Why are Database Management Systems Important?
47
Why are Database Management Systems Important?
48
Why are Database Management Systems Important?
49
Why are Database Management Systems Important?
50
Why are Database Management Systems Important?
51
Advantages of a DBMS
• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.
52
Advantages of a DBMS
• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.
53
Advantages of a DBMS
• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.
54
Advantages of a DBMS
• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.
55
Advantages of a DBMS
• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.
56
Advantages of a DBMS
• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.
57
Why are transactions important?
58
Why are transactions important?
• Prog
59
Why are transactions important?
• Prog
60
Why are transactions important?
• Prog
61
Structure of a DBMS
• A typical DBMS has a layered architecture.
• This is a simplified version and one of several possible architectures; each
system has its own variations.
Transaction
Manager
Catalog
DB
67
DBMS Products
DBMS Products
Source: https://db-engines.com/en/ranking
68
DBMS Products
DBMS Products
Source: https://db-engines.com/en/ranking
69
DBMS Products
DBMS Products
Source: https://db-engines.com/en/ranking
70
Most Popular Databases
71
Source: Stackoverflow, 2020
Most Popular Languages
72
Source: Stackoverflow, 2020
Conclusion
73
Database
Query Languages
74
Learning Outcomes
75
Query Languages
76
Query Operations
77
Query Operations
78
Query Operations
79
Query Operations
4500
80
Query Operations
81
SQL
82
SQL
83
SQL
84
SQL
4500
85
SQL
86
Learning Outcomes
87
Databases
Database Design
88
Learning Outcomes
89
Database Design
90
Database Design
91
Database Design Steps
92
Requirements Analysis
93
Conceptual Design
• Conceptual design builds on the requirements analysis step to develop
a high-level description of the data and the constraints on it.
• Its goal is to provide a simple and precise description of the data called
the conceptual schema.
• It is typically carried out using a graphical representation, such as the
Entity Relationship Diagram:
• Requirements are represented as entities, relationships, and attributes.
• An entity is a person, place, activity, or thing.
• A relationship is a link between entities
• An attribute is a descriptive property of an entity. An attribute that uniquely
identifies an entity is called a key.
94
Conceptual Design Example
95
Conceptual Design Example
96
Logical Design
97
Logical Design Example
98
Logical Design Example
99
Physical Design
100
Example: Bookstore
• Logical schema:
Author
AuthorID
FirstName
LastName
BirthDate
• Physical schema:
• Table Author stored with index on AuthorID.
• Applications (orders and inventory applications):
• Author(AuthorID, FirstName, LastName, BirthDate)
101
Example: Bookstore
• Logical schema:
Author Author_Private Author_Public
• Physical schema:
• ?
• Applications (orders and inventory applications):
• ?
102
Example: Bookstore
• Logical schema:
Author Author_Private Author_Public
• Physical schema:
• Tables Author_Private and Author_Public stored with an index each on AuthorID.
• Applications (orders and inventory applications):
• ?
103
Example: Bookstore
• Logical schema:
Author Author_Private Author_Public
• Physical schema:
• Tables Author_Private and Author_Public stored with an index each on
AuthorID.
• Applications (orders and inventory applications):
• No changes, thanks to the data independence property of DBMS!
104
Example: Bookstore
• Physical schema:
• Tables Author_Private and Author_Public stored with two indexes each on
TaxNumber and LastName respectively.
• Logical schema:
• ?
• Applications (orders and inventory applications):
• ?
105
Example: Bookstore
• Physical schema:
• Tables Author_Private and Author_Public stored with two indexes each on
TaxNumber and LastName respectively.
• Logical schema:
• No changes, thanks to the data independence property of DBMS!
• Applications (orders and inventory applications):
• No changes, thanks to the data independence property of DBMS!
106
Data Independence
107
Levels of Abstraction
108
Levels of Abstraction
109
Levels of Abstraction
110
Levels of Abstraction
111
Levels of Abstraction
112
Levels of Abstraction
113
Levels of Abstraction
114
Database Programming
• SQL is usually combined with a
general-purpose programming
language such as C++, Java, or
Python.
• Database programs typically use
an application programming
interface, or API, to simplify the use
of SQL with other languages.
• An API is a library of procedures or
classes that links a host
programming language to a
database
115
Database Programming: Example
The Book table contains book ID, title, category, and price.
116
Database Programming: Example
117
Database Programming: Example
118
Database Programming: Example
119
Database Programming: Example
120
Database Programming: Example
121
Database Programming: Example
122
Database Programming: Example
123
Conclusion
124
Databases
Conceptual Design (Entity-Relationship Diagram)
Chapter 2
125
Overview of Database Design
126
ER Model Basics
Employees
127
ER Model Basics (Contd.)
• Relationship: Association among two or more entities. E.g.,
Attishoo works in Pharmacy department.
• Relationship Set: Collection of similar relationships.
• An n-ary relationship set R relates n entity sets E1 ... En; each
relationship in R involves entities e1 ∈ E1, ..., en ∈ En
• Same entity set could participate in different relationship sets, or in
different “roles” in same set.
since
name dname
cin lot did budget
130
Key Constraints
since
name dname
• An employee can work in
cin lot did budget
many departments; a dept
can have many employees.
• In contrast, each dept has at Employees Works_In Departments
131
Key Constraints
since
name dname
• An employee can work in
cin lot did budget
many departments; a dept
can have many employees.
• In contrast, each dept has at Employees Works_In Departments
since
name dname
cin lot did budget
Works_In
since
136
Participation Constraints
• Does every department have a manager?
• If so, this is a participation constraint:
• the participation of Employees in Works_In said to be total (vs. partial).
• Means at least one
since
name dname
cin lot did budget
Works_In
since
137
Participation Constraints
• Does every department have a manager?
• If so, this is a participation constraint:
• the participation of Employees in Works_In said to be total (vs. partial).
• Means at least one
since
name dname
cin lot did budget
Works_In
since
138
Weak Entities
• A weak entity can be identified uniquely only by considering the primary key of
another (owner) entity.
• Owner entity set and weak entity set must participate in a one-to-many relationship set
(one owner, many weak entities).
• Weak entity set must have total participation in this identifying relationship set.
• Weak entities have a partial key (dashed line)
name
cost pname age
cin lot
139
ISA (`is a’) Hierarchies
❖ As in C++, or other PLs, attributes are inherited.
❖ If we declare A ISA B, every A entity is also
considered to be a B entity.
❖ Overlap constraints: Can Joe be an Hourly_Emps as
well as a Contract_Emps entity? name
(Allowed/disallowed) cin lot
❖ Covering constraints: Does every Employees entity
also have to be an Hourly_Emps or a Contract_Emps Employees
entity? (Yes/no)
❖ Reasons for using ISA:
hourly_wages hours_worked
• To add descriptive attributes specific to a ISA
subclass. contractid
• To identify entitities that participate in a
relationship. Hourly_Emps Contract_Emps
140
Binary vs. Ternary Relationships
name
cin lot pname age
Policies
policyid cost
141
Binary vs. Ternary Relationships
name
cin lot pname age
Policies
policyid cost
142
Binary vs. Ternary Relationships
name
cin lot pname age
Policies
policyid cost
name pname age
cin lot
Dependents
Employees
Purchaser
Beneficiary
Better design
Policies
Country
cid cname
144
Binary vs. Ternary Relationships (Contd.)
145
in sql it means grouping entities according to a condition
Employees
• Used when we have to
model a relationship
involving (entity sets Monitors until
and) a relationship set.
• Aggregation allows us to
started_on since
treat a relationship set as dname
an entity set for pid pbudget did budget
purposes of participation
in (other) relationships. Projects Sponsors Departments
• Design choices:
• Should a concept be modeled as an entity or an attribute?
• Should a concept be modeled as an entity or a relationship?
• Identifying relationships: Binary or ternary? Aggregation?
• Constraints in the ER Model:
• A lot of data semantics can (and should) be captured.
• But some constraints cannot be captured in ER diagrams.
147
Entity vs. Attribute
148
Entity vs. Attribute (Contd.)
• Works_In2 does not allow
from to
an employee to work in a name dname
department for two or more cin lot did budget
periods.
• Similar to the problem of Employees Works_In2 Departments
149
Entity vs. Relationship
• First ER diagram OK if a since dbudget
manager gets a separate name dname
cin lot did budget
discretionary budget for each
dept. Employees Departments
Manages2
• What if a manager gets a
discretionary budget that covers
all managed depts?
• Redundancy: dbudget stored for
each dept managed by manager.
• Misleading: Suggests dbudget
associated with department-mgr
combination.
150
Entity vs. Relationship
• First ER diagram OK if a since dbudget
manager gets a separate name dname
cin lot did budget
discretionary budget for each
dept. Employees Departments
Manages2
• What if a manager gets a
discretionary budget that covers
all managed depts?
• Redundancy: dbudget stored for
each dept managed by manager.
• Misleading: Suggests dbudget
associated with department-mgr
combination.
151
Entity vs. Relationship
• First ER diagram OK if a since dbudget
manager gets a separate name dname
cin lot did budget
discretionary budget for each
dept. Employees Departments
Manages2
• What if a manager gets a
discretionary budget that covers name
all managed depts? cin lot
• Redundancy: dbudget stored for dname
each dept managed by manager. Employees did budget
153
Summary of ER (Contd.)
154
Summary of ER (Contd.)
155
Different ER Notations
• There exist different types of notations
• Arrow notation
name
cost pname age
cin lot
156
Different ER Notations
• There exist different types of notations
• Arrow notation
• Chen notation name
cost pname
• UML notation cin lot age
• Barker notation
• IDEF1X notation Employees Policy Dependents
• Crow’s Foot notation
157
Different ER Notations
name
cost pname age
cin lot
158
Different ER Notations
name
cost pname age
cin lot
160
Relational Database: Definitions
• Can think of a relation as a set of rows or tuples (i.e., all rows are
distinct).
161
Example Instance of Employees Relation
164
Recall
165
Converting ER to Relational Model
CREATE TABLE Manages
• Tools available to do this automatically (
cin CHAR(12),
• But, let’s learn the basics did INT,
since DATE,
• Converting relationships: 1-M PRIMARY KEY (did),
FOREIGN KEY (cin) REFERENCES Employees (cin),
since FOREIGN KEY (did) REFERENCES Departments(did)
name dname );
cin lot did budget
166
Converting ER to Relational Model
CREATE TABLE Manages
• Tools available to do this automatically (
cin CHAR(12),
• But, let’s learn the basics did INT,
since DATE,
• Converting relationships: 1-M PRIMARY KEY (did),
FOREIGN KEY (cin) REFERENCES Employees (cin),
since FOREIGN KEY (did) REFERENCES Departments(did)
name dname );
cin lot did budget
since
name dname
cin lot did budget
CREATE TABLE Dept_Mgr
Employees Manages Departments (
did INT,
dname VARCHAR(50),
Works_In budget FLOAT,
cin CHAR(12) NOT NULL,
PRIMARY KEY (did),
since FOREIGN KEY (cin) REFERENCES Employees(cin)
168
);
Converting ER to Relational Model
since
name dname
cin lot did budget
CREATE TABLE Dept_Mgr
Employees Manages Departments (
did INT,
dname VARCHAR(50),
Works_In budget FLOAT,
cin CHAR(12) NOT NULL,
PRIMARY KEY (did),
since FOREIGN KEY (cin) REFERENCES Employees(cin)
169
ON DELETE NO ACTION);
Converting ER to Relational Model
171
The SQL Query Language
172
The SQL Query Language
•To find just names and cin, replace the first line:
173
Querying Multiple Relations
• Given the following instances of Employees and Works_In
174
Querying Multiple Relations
• Given the following instances of Employees and Works_In
175
Destroying and Altering Relations
176
Destroying and Altering Relations
177
Adding and Deleting Tuples
• IC: condition that must be true for any instance of the database;
e.g., domain constraints.
• ICs are specified when schema is defined.
• ICs are checked when relations are modified.
• A legal instance of a relation is one that satisfies all specified ICs.
• DBMS should not allow illegal instances.
• If the DBMS checks ICs, stored data is more faithful to real-world
meaning. Integrity Constraints are rules or conditions that are defined when creating a database
schema to ensure data accuracy, consistency, and adherence to real-world constraints.
The DBMS plays a crucial role in enforcing these constraints to maintain the integrity
• Avoids data entry errors, too! and quality of the data stored in the database. This, in turn, helps prevent data entry
errors and ensures that the data accurately represents the real-world domain it is meant
to capture.
179
A primary key constraint is a type of integrity
constraint used in relational databases to ensure the
Primary Key Constraints uniqueness and reliability of data. It is a
fundamental concept in database management
2. This is not true for any subset of the key. It may contain more attributes than necessary to uniquely identify records, making it a superset
of a candidate key.
Superkeys can have attributes that are not strictly required for uniqueness.
• Part 2 false? A superkey. we have this key as an extra and it's not minimum like when we have cin first
name and last name , the two last ones are considered as superkey
• If there’s >1 key for a relation, one of the keys is chosen (by DBA) to be
the primary key. when we have two keys like massar id and cin we can chose one to be a primary key
• E.g., cin is a key for Employees. (What about name?) The set
{cin, lot} is a superkey.
Foreign Key:
A foreign key is an attribute or set of attributes in one table that refers to the primary key in another table.
It establishes relationships between tables and enforces referential integrity by ensuring that values in the foreign key match values in
the primary key of the referenced table
180
Where do ICs Come From?
• ICs are based upon the semantics of the real-world enterprise that
is being described in the database relations.
• We can check a database instance to see if an IC is violated, but
we can NEVER infer that an IC is true by looking at an instance.
• An IC is a statement about all possible instances!
• From example, we know name is not a key, but the assertion that sid is a
key is given to us.
• Key and foreign key ICs are the most common; more general ICs
supported too.
181
Relational Model: Summary
• A tabular representation of data.
• Simple and intuitive, currently the most widely used.
• Integrity constraints can be specified by the DBA, based on application
semantics. DBMS checks for violations.
• Two important ICs: primary and foreign keys
• In addition, we always have domain constraints.
• Powerful and natural query languages exist.
• Rules to translate ER to relational model
• We will revisit more in detail later in the course
182