0% found this document useful (0 votes)
13 views

Week1 Lecture

This document provides an overview of an introduction to databases course. It outlines the instructor, course structure and evaluation, logistics, syllabus, prerequisites, learning outcomes, and references. Key information includes the instructor is Prof. Karima Echihabi, the course meets on Tuesdays and Thursdays, covers topics such as database design, SQL, and emerging non-relational systems, and aims for students to understand database application development and use of a relational database.

Uploaded by

samia lachgar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Week1 Lecture

This document provides an overview of an introduction to databases course. It outlines the instructor, course structure and evaluation, logistics, syllabus, prerequisites, learning outcomes, and references. Key information includes the instructor is Prof. Karima Echihabi, the course meets on Tuesdays and Thursdays, covers topics such as database design, SQL, and emerging non-relational systems, and aims for students to understand database application development and use of a relational database.

Uploaded by

samia lachgar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 182

Course

Introduction to Databases

Professor: Karima Echihabi


Program: Computer Engineering
Session: Fall 2023
Introduction
Organization
• Instructor
• Prof. Karima Echihabi: [email protected]
• Structure
• Lectures
• Hands-on Labs: Written and Programming Assignments
• Exams
• Project
• Evaluation
• 20% Homework Assignments
• 30% Midterm Exam
• 20% Project
• 30% Final Exam

3
Logistics

• Tuesdays: 11:20am-12:50pm and 2:15pm-3:45pm


• Lectures
• Thursday: 10:10am-11:10am
• Lab
• Course Archive:
• https://um6p.instructure.com/courses/3671

4
Syllabus Overview

• Introduction to file systems and database management systems


• The Entity-Relationship model/Relational algebra
• Normalization theory
• The Structured Query Language (SQL)
• Physical Design and Indexing
• Integrity constraints, views, triggers, and authorization
• Overview of transaction management
• Overview of emerging non-relational systems

5
Pre-requisites

• Programming 1

• Algorithms 2

6
What you will achieve

• Understand how to design a database application


• Learn how to use a relational database management system
• Conduct a project that builds a real database application
• Learn about new trends in the database field including non-
relational systems

7
References
• Textbooks
• [RG] Ramakrishnan R. and Gehrke J. Database Management
Systems, 3rd edition. McGraw-Hill Science/Engineering/Math, 2002.
• DMS solution manual for odd-numbered exercices

• Publicly available PDF in Canvas

• [PW] Database Systems with SQL. ZyBooks. John Wiley & Sons, Inc. 2022.
(Available through Canvas).

• Available through Canvas

• Course slides based on both textbooks


8
Who is who? Instructor:
Engineer
Microsoft
MSc. PhD
Univ. of Toronto Universite de Paris

CEO
Montreal, Rabat PhD
Universite Mohammed V

Engineer BSc
IBM Toronto Lab Al Akhawayn University

Assistant Professor Karima


UM6P-CS
Echihabi

• Assistant Professor at UM6P-CS • Formally:


▫ Co-chair NETYS 2021 ▫ CEO of two startups in Canada and Morocco
▫ PC member of ACM SIGMOD, VLDB ▫ Software Engineer at Microsoft, Corp. Redmond,
▫ Publications in top data management venues USA
• Expertise in databases, data mining, High-  Ship-It Award for Windows 2000
dimensional/IoT data ▫ Software Engineer at IBM Toronto, Canada
9
 Key contributor for the Query Optimizer team
Who is who? TAs:

Hasnae Khaoula Soufiane


Zerouaoui Abdenouri Goudzi
Postdoctoral 2nd year PhD Software
Fellow Student Engineer

Microsoft Research
PhD Fellowship

10
Our Work

• Topics:
• Data Engineering: data cleaning, data discovery, data integration
• Large scale data processing and analytics
• IoT/Time Series data management
• Deep network embeddings

11
Research Directions
Scalable and accurate analytics
Efficient similarity search on massive collections of high-dimensional vectors
and thus, efficient high-d vector analytics (eg, classification)

Far reaching fundamental and practical applications


Various domains Numerous applications
Deep Network Embeddings
Recommender Systems
Software Engineering
Finance Agriculture Medicine Information Retrieval
Outlier Detection
Data Integration
Cybersecurity
Biology Manufacturing IoT Classification
12
Clustering
Databases
Why Study Databases?

13
Learning Outcomes

• Explain why databases are important


• Describe the data pipeline
• Define key database terminology
• Identify different database roles

14
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data discovery
• Turing awards and successful startups
• Excellent career prospects:
• New professions focused on data
• Good compensation
• In hot demand

15
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data discovery
• Turing awards and successful startups
• Excellent career prospects:
≈ 500 ZB per year
• New professions focused on data
• Good compensation
• In hot demand

1 TB = 1012 Bytes, 1 PB = 1015 Bytes, 1 ZB = 1021 Bytes


16
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data discovery
• Turing awards and successful startups
• Excellent career prospects:
≈ 500 ZB per year ≈ 130 TB
• New professions focused on data
• Good compensation
• In hot demand

1 TB = 1012 Bytes, 1 PB = 1015 Bytes, 1 ZB = 1021 Bytes


17
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data discovery
• Turing awards and successful startups
• Excellent career prospects:
≈ 500 ZB per year ≈ 130 TB
• New professions focused on data
• Good compensation
• In hot demand
> 5 TB per day

1 TB = 1012 Bytes, 1 PB = 1015 Bytes, 1 ZB = 1021 Bytes


18
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data discovery
• Turing awards and successful startups
• Excellent career prospects:
≈ 500 ZB per year ≈ 130 TB
• New professions focused on data
• Good compensation
• In hot demand
> 5 TB per day > 40 PB per day

1 TB = 1012 Bytes, 1 PB = 1015 Bytes, 1 ZB = 1021 Bytes


19
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data discovery
• Turing awards and successful startups
• Excellent career prospects:
• New professions focused on data
• Good compensation
• In hot demand

20
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data discovery
• Turing awards and successful startups
• Excellent career prospects:
• New professions focused on data
• Good compensation
• In hot demand

21
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data discovery
• Turing awards and successful startups
• Excellent career prospects:
• New professions focused on data
• Good compensation
• In hot demand

22
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data
discovery
• Turing awards and successful startups
• Excellent career prospects:
• New professions focused on data
• Good compensation
• In hot demand

23
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data
discovery
• Turing awards and successful startups
• Excellent career prospects:
• New professions focused on data
• Good compensation
• In hot demand

24
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data
discovery
• Turing awards and successful startups
• Excellent career prospects:
• New professions focused on data
• Good compensation
• In hot demand

25
Why Are Databases Important?

• Databases are at the backend of most applications


• 4th paradigm for data
discovery
• Turing awards and successful startups
• Excellent career prospects:
• New professions focused on data
• Good compensation
• In hot demand

26
The Data Pipeline
Business
Application

Dissemination Data Preparation

Data Storage and


Data Analysis
Management

User requirements
Problem Definition

27
The Data Pipeline
Business
Application

Dissemination Data Preparation

Data Storage and


Data Analysis
Management

Data Selection, Data ingestion


Data Cleaning, Data Integration

28
The Data Pipeline
Business
Application

Dissemination Data Preparation

Data Storage and


Data Analysis
Management

Relational Systems, Distributed Storage,


Big Data Platforms, Data Lakes, etc.

29
The Data Pipeline
Business
Application

Dissemination Data Preparation

Data Storage and


Data Analysis
Management

Feature Engineering
Statistical/ML Models

30
The Data Pipeline
Business
Application

Dissemination Data Preparation

Data Storage and


Data Analysis
Management

Visualization, Presentation
Knowledge, Insights

31
The Data Pipeline
Business
Application

Dissemination Data Preparation

Data Storage and


Data Analysis
Management

Relational Systems, Distributed Storage,


Big Data Platforms, Data Lakes, etc.

32
Key Terminology

• Data
• Data is numeric, textual, visual, or audio information that describes real-world
systems.
• Analog
• Historically, data was mostly analog, encoded as continuous variations on
various physical media.
• Digital
• Today, data is mostly digital, encoded as zeros and ones on electronic and
magnetic media.

33
Key Terminology
• Database:
• A database is a collection of data in a structured format.
• Database system / Database management system / DBMS
• A database system, also known as a database management system or DBMS, is
software that reads and writes data in a database. Database systems ensure data is
secure, internally consistent, and available at all times.
• Query Language
• A query language is a specialized programming language, designed specifically for
database systems.
• Database Application
• A database application is software that helps business users interact with database
systems.

34
Key Database Roles

35
Key Database Roles

36
Key Database Roles

37
Key Database Roles

38
Other Data Related Careers

• Data Engineers
• Data Scientists
• Data Analysts
• Data Architects
• Chief Data Officer

Source: https://data-flair.training/blogs/data-
scientist-vs-data-engineer-vs-data-analyst/

39
Other Data Related Careers

• Data Engineers
• Data Analysts
• Data Scientists
• Chief Data Officer

Source: https://data-flair.training/blogs/data-
scientist-vs-data-engineer-vs-data-analyst/

40
Other Data Related Careers

• Data Engineers
• Data Analysts
• Data Scientists
• Data Architects
• Chief Data Officer

Source: https://data-flair.training/blogs/data-
scientist-vs-data-engineer-vs-data-analyst/

41
Other Data Related Careers

• Data Engineers
• Data Analysts
• Data Scientists
• Data Architects
• Chief Data Officer

42
Conclusion

• Explained why databases are important


• Described the data pipeline
• Defined key database terminology
• Identified different database roles

43
Databases
Overview of Database Management Systems

44
Why are Database Management Systems Important?

• A small database could be stored as a collection of files.


• But, as data grows, using a DBMS becomes indispensable.

45
Why are Database Management Systems Important?

• A small database could be stored as a collection of files.


• But, as data grows, using a DBMS becomes indispensable.
• Motivating example:
Limitations of file systems

46
Why are Database Management Systems Important?

• A small database could be stored as a collection of files.


• But, as data grows, using a DBMS becomes indispensable.
• Motivating example:
Limitations of file systems

47
Why are Database Management Systems Important?

• A small database could be stored as a collection of files.


• But, as data grows, using a DBMS becomes indispensable.
• Motivating example:
Limitations of file systems

48
Why are Database Management Systems Important?

• A small database could be stored as a collection of files.


• But, as data grows, using a DBMS becomes indispensable.
• Motivating example:
Limitations of file systems

49
Why are Database Management Systems Important?

• A small database could be stored as a collection of files.


• But, as data grows, using a DBMS becomes indispensable.
• Motivating example:
Limitations of file systems

50
Why are Database Management Systems Important?

• A small database could be stored as a collection of files.


• But, as data grows, using a DBMS becomes indispensable.
• Motivating example:
Limitations of file systems

51
Advantages of a DBMS

• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.

52
Advantages of a DBMS

• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.

53
Advantages of a DBMS

• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.

54
Advantages of a DBMS

• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.

55
Advantages of a DBMS

• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.

56
Advantages of a DBMS

• Data independence.
• Efficient data access.
• Reduced application development time.
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes.

57
Why are transactions important?

• A transaction is a group of queries that


must be either completed or rejected as a
whole.
• Execution of some, but not all, queries
results in inconsistent/incorrect data.
• Example: a money transfer.

58
Why are transactions important?

• Prog

Two programs access a bank database. The database


tracks customer deposits, credits, and account balances

59
Why are transactions important?

• Prog

Program A requests the database transfer $50 from Raul to Mai

60
Why are transactions important?

• Prog

Transaction 1 deducts $50 from Raul's account and


adds $50 to Mai's account.

61
Structure of a DBMS
• A typical DBMS has a layered architecture.
• This is a simplified version and one of several possible architectures; each
system has its own variations.

Source: Raghu Ramakrishnan, Johannes Gehrke: Database


management systems (3. ed.). McGraw-Hill 2003, ISBN 978-0- DBMS Architecture
07-115110-8, pp. I-XXXII, 1-1065
62
Structure of a DBMS
• A typical DBMS has a layered architecture.
Query Processor

• This is a simplified version and one of several possible architectures; each


system has its own variations.

Source: Raghu Ramakrishnan, Johannes Gehrke: Database


management systems (3. ed.). McGraw-Hill 2003, ISBN 978-0- DBMS Architecture
07-115110-8, pp. I-XXXII, 1-1065
63
Structure of a DBMS
• A typical DBMS has a layered architecture.
• This is a simplified version and one of several possible architectures; each
system has its own variations.
Storage Manager

Source: Raghu Ramakrishnan, Johannes Gehrke: Database


management systems (3. ed.). McGraw-Hill 2003, ISBN 978-0- DBMS Architecture
07-115110-8, pp. I-XXXII, 1-1065
64
Structure of a DBMS
• A typical DBMS has a layered architecture.
• This is a simplified version and one of several possible architectures; each
system has its own variations.

Transaction
Manager

Source: Raghu Ramakrishnan, Johannes Gehrke: Database


management systems (3. ed.). McGraw-Hill 2003, ISBN 978-0- DBMS Architecture
07-115110-8, pp. I-XXXII, 1-1065
65
Structure of a DBMS
• A typical DBMS has a layered architecture.
• This is a simplified version and one of several possible architectures; each
system has its own variations.

Catalog

Source: Raghu Ramakrishnan, Johannes Gehrke: Database


management systems (3. ed.). McGraw-Hill 2003, ISBN 978-0- DBMS Architecture
07-115110-8, pp. I-XXXII, 1-1065
66
Structure of a DBMS
Queries Answers These layers
• A typical DBMS has a layered must consider
architecture. concurrency
Query Processing and control and
• The figure does not show the Optimization recovery
concurrency control and recovery
components. Relational Operators

• This is one of several possible Files and Access Methods


architectures; each system has its
own variations. Buffer Management

Disk Space Management

DB

67
DBMS Products

DBMS Products

Source: https://db-engines.com/en/ranking

68
DBMS Products

DBMS Products

Source: https://db-engines.com/en/ranking

69
DBMS Products

DBMS Products

Source: https://db-engines.com/en/ranking

70
Most Popular Databases

71
Source: Stackoverflow, 2020
Most Popular Languages

72
Source: Stackoverflow, 2020
Conclusion

• Explained the advantages of managing data using a database


management system (DBMS) vs. a file system.
• Motivated the importance of transactions in a DBMS.
• Listed the most popular DBMS products in the market.

73
Database
Query Languages

74
Learning Outcomes

• Understand the role of query languages and the Structured Query


Language (SQL) in particular.
• Understand the four basic query operations: Create (Insert), Read
(Select), Update, Delete (CRUD).
• Learn how to perform CRUD operations using SQL.

75
Query Languages

• A query language is a programming language used for


manipulating data in a database.
• The Structured Query Language (SQL) is the standard query
language for relational database management systems.
• The term NoSQL refers to a new generation of non-relational
databases.

76
Query Operations

A bank database stores the names and balances for two


accounts: Raul and Mai.

77
Query Operations

An insert query inserts new data into the database.


Ethan's new account is inserted into the database.

78
Query Operations

A select query retrieves information from the database.


The query retrieves the names of individuals that have a
balance more than $3000.

79
Query Operations

4500

An update query changes existing data in the database.


Raul's balance is changed from 3300 to 4500.

80
Query Operations

A delete query removes data from the database. Mai's


account is removed

81
SQL

• The four basic SQL query operators are INSERT, SELECT,


UPDATE, DELETE

82
SQL

• The four basic SQL query operators are INSERT, SELECT,


UPDATE, DELETE

83
SQL

• The four basic SQL query operators are INSERT, SELECT,


UPDATE, DELETE

84
SQL

• The four basic SQL query operators are INSERT, SELECT,


UPDATE, DELETE

4500

85
SQL

• The four basic SQL query operators are INSERT, SELECT,


UPDATE, DELETE

86
Learning Outcomes

• Explained the roles of query languages and the Structured Query


Language (SQL).
• Described the four basic query operations: Create (Insert), Read
(Select), Update, Delete (CRUD).
• Showed how to perform CRUD operations using SQL.

87
Databases
Database Design

88
Learning Outcomes

• Understand the importance of database design


• Get acquainted with the four main steps in database design:
requirements analysis, conceptual design, logical design and
physical design.
• Understand the concept of data independence
• Learn the two types of data independence.

89
Database Design

• Database design is a specification of database objects. It also


refers to the process used to develop the specifications.
• Database design is typically one key part of the larger process of
software design.
• Good database design guarantees information consistency,
eliminates data redundancy, and improves database performance.

90
Database Design

A design document for a large database

91
Database Design Steps

• Database design consists of four key phases:


• Requirements Analysis
• Conceptual Design
• Logical Design
• Physical Design

92
Requirements Analysis

• Requirements analysis is the first step in designing a database.


• It is typically an informal process involving all stakeholders.
• It consists of understanding:
• the data
• the business rules
• the applications
• the frequent operations

93
Conceptual Design
• Conceptual design builds on the requirements analysis step to develop
a high-level description of the data and the constraints on it.
• Its goal is to provide a simple and precise description of the data called
the conceptual schema.
• It is typically carried out using a graphical representation, such as the
Entity Relationship Diagram:
• Requirements are represented as entities, relationships, and attributes.
• An entity is a person, place, activity, or thing.
• A relationship is a link between entities
• An attribute is a descriptive property of an entity. An attribute that uniquely
identifies an entity is called a key.

94
Conceptual Design Example

Conceptual Diagram for a bookstore database

ERD Diagrams for a bookstore database using two


different notations

95
Conceptual Design Example

ERD Diagrams for a bookstore database using two


different notations

96
Logical Design

• Logical database design converts the conceptual model into a data


model of a specific database system.
• For relational database systems, logical design converts entities,
relationships, and attributes into tables, keys, and columns.
• It results in a logical schema.

97
Logical Design Example

Logical Diagram for a bookstore database

98
Logical Design Example

Logical Diagram for a bookstore database using two


different notations

99
Physical Design

• Physical database design refines the logical design to efficiently


support the frequent query workloads.
• It can simply consist of building indexes on some tables
• Or it may involve substantial modifications to the logical design
obtained in earlier steps.
• It results in a physical schema.

100
Example: Bookstore
• Logical schema:
Author

AuthorID
FirstName
LastName
BirthDate

• Physical schema:
• Table Author stored with index on AuthorID.
• Applications (orders and inventory applications):
• Author(AuthorID, FirstName, LastName, BirthDate)

101
Example: Bookstore
• Logical schema:
Author Author_Private Author_Public

AuthorID AuthorID AuthorID


FirstName BirthDate FirstName
LastName TaxNumber LastName
BirthDate

• Physical schema:
• ?
• Applications (orders and inventory applications):
• ?

102
Example: Bookstore
• Logical schema:
Author Author_Private Author_Public

AuthorID AuthorID AuthorID


FirstName BirthDate FirstName
LastName TaxNumber LastName
BirthDate

• Physical schema:
• Tables Author_Private and Author_Public stored with an index each on AuthorID.
• Applications (orders and inventory applications):
• ?

103
Example: Bookstore
• Logical schema:
Author Author_Private Author_Public

AuthorID AuthorID AuthorID


FirstName BirthDate FirstName
LastName TaxNumber LastName
BirthDate

• Physical schema:
• Tables Author_Private and Author_Public stored with an index each on
AuthorID.
• Applications (orders and inventory applications):
• No changes, thanks to the data independence property of DBMS!

104
Example: Bookstore

• Physical schema:
• Tables Author_Private and Author_Public stored with two indexes each on
TaxNumber and LastName respectively.
• Logical schema:
• ?
• Applications (orders and inventory applications):
• ?

105
Example: Bookstore

• Physical schema:
• Tables Author_Private and Author_Public stored with two indexes each on
TaxNumber and LastName respectively.
• Logical schema:
• No changes, thanks to the data independence property of DBMS!
• Applications (orders and inventory applications):
• No changes, thanks to the data independence property of DBMS!

106
Data Independence

• Data independence is a very


important advantage of using a
DBMS.
• It insulates application programs
from changes in the way the data is
structured and stored.
• It is achieved through the use of the
three levels of data abstraction:
• External schema (i.e., applications)
• Conceptual/Logical schema
• Physical schema

107
Levels of Abstraction

• Many views, single logical schema


and physical schema.
• External schemas (or Views)
describe how users see the data.
View 1 View 2 View 3
• Logical schema defines logical
structure
Logical Schema
• Physical schema describes the files
and indexes used. Physical Schema
• Two types of data independence:
logical and physical
DB

108
Levels of Abstraction

• Many views, single logical schema


and physical schema.
• External schemas (or Views)
describe how users see the data.
View 1 View 2 View 3
• Logical schema defines logical
structure
Logical Schema
• Physical schema describes the files
and indexes used. Physical Schema
• Two types of data independence:
logical and physical
DB

109
Levels of Abstraction

• Many views, single logical schema


and physical schema.
• External schemas (or Views)
describe how users see the data.
View 1 View 2 View 3
• Logical schema defines logical
structure
Logical Schema
• Physical schema describes the files
and indexes used. Physical Schema
• Two types of data independence:
logical and physical
DB

110
Levels of Abstraction

• Many views, single logical schema


and physical schema.
• External schemas (or Views)
describe how users see the data.
View 1 View 2 View 3
• Logical schema defines logical
structure
Logical Schema
• Physical schema describes the files
and indexes used. Physical Schema
• Two types of data independence:
logical and physical
DB

111
Levels of Abstraction

• Many views, single logical schema


and physical schema.
• External schemas (or Views)
describe how users see the data.
View 1 View 2 View 3
• Logical schema defines logical
structure
Logical Schema
• Physical schema describes the files
and indexes used. Physical Schema
• Two types of data independence:
logical and physical
DB

112
Levels of Abstraction

• Many views, single logical schema


and physical schema.
• External schemas (or Views)
describe how users see the data.
View 1 View 2 View 3
• Logical schema defines logical
structure
Logical Schema
• Physical schema describes the files
and indexes used. Physical Schema
• Two types of data independence:
logical and physical
DB

113
Levels of Abstraction

• Many views, single logical schema


and physical schema.
• External schemas (or Views)
describe how users see the data.
View 1 View 2 View 3
• Logical schema defines logical
Logical Data Independence
structure
Logical Schema
• Physical schema describes the files Physical Data Independence
and indexes used. Physical Schema
• Two types of data independence:
logical and physical
DB

114
Database Programming
• SQL is usually combined with a
general-purpose programming
language such as C++, Java, or
Python.
• Database programs typically use
an application programming
interface, or API, to simplify the use
of SQL with other languages.
• An API is a library of procedures or
classes that links a host
programming language to a
database

115
Database Programming: Example

The Book table contains book ID, title, category, and price.

116
Database Programming: Example

117
Database Programming: Example

118
Database Programming: Example

119
Database Programming: Example

120
Database Programming: Example

121
Database Programming: Example

122
Database Programming: Example

123
Conclusion

• Motivated the importance of


database design and data
independance are very important.
• Described the four main phases of
database design.
• Explained logical and physical
data independence.

124
Databases
Conceptual Design (Entity-Relationship Diagram)
Chapter 2

125
Overview of Database Design

• Conceptual design: (ER Model is used at this stage.)


• What are the entities and relationships in the enterprise?
• What information about these entities and relationships should we store in
the database?
• What are the integrity constraints or business rules that hold?
• A database `schema’ in the ER Model can be represented pictorially (ER
diagrams).
• Can map an ER diagram into a relational schema.

126
ER Model Basics

• Entity: Real-world object distinguishable from other objects. An


entity is described (in DB) using a set of attributes.
• Entity Set: A collection of similar entities. E.g., all employees.
• All entities in an entity set have the same set of attributes. (Until we
consider ISA hierarchies, anyway!)
• Each entity set has a key.
name
• Each attribute has a domain. cin lot

Employees

127
ER Model Basics (Contd.)
• Relationship: Association among two or more entities. E.g.,
Attishoo works in Pharmacy department.
• Relationship Set: Collection of similar relationships.
• An n-ary relationship set R relates n entity sets E1 ... En; each
relationship in R involves entities e1 ∈ E1, ..., en ∈ En
• Same entity set could participate in different relationship sets, or in
different “roles” in same set.

since
name dname
cin lot did budget

Employees Works_In Departments


128
ER Model Basics (Contd.)
• Relationship: Association among two or more entities. E.g.,
Attishoo works in Pharmacy department.
• Relationship Set: Collection of similar relationships.
• An n-ary relationship set R relates n entity sets E1 ... En; each
relationship in R involves entities e1 ∈ E1, ..., en ∈ En
• Same entity set could participate in different relationship sets, or in
different “roles” in same set.
since
name dname
cin lot did budget

subordinate Employees Works_In Departments


Reports_To
supervisor
129
Key Constraints
since
name dname
• An employee can work in
cin lot did budget
many departments; a dept
can have many employees.
Employees Works_In Departments

130
Key Constraints
since
name dname
• An employee can work in
cin lot did budget
many departments; a dept
can have many employees.
• In contrast, each dept has at Employees Works_In Departments

most one manager,


according to the key Manages
constraint on Department in
the Manages relationship.
• This key constraint results in
a 1-to-many relationship

131
Key Constraints
since
name dname
• An employee can work in
cin lot did budget
many departments; a dept
can have many employees.
• In contrast, each dept has at Employees Works_In Departments

most one manager,


according to the key Manages
constraint on Department in
the Manages relationship.

1-to Many 132


Key Constraints
since
name dname
• An employee can work in
cin lot did budget
many departments; a dept
can have many employees.
• In contrast, each dept has at Employees Works_In Departments

most one manager,


according to the key Manages
constraint on Department in
the Manages relationship.

1-to Many Many-to-1 133


Key Constraints
since
name dname
• An employee can work in
cin lot did budget
many departments; a dept
can have many employees.
• In contrast, each dept has at Employees Works_In Departments

most one manager,


according to the key Manages
constraint on Department in
the Manages relationship.

1-to Many Many-to-1 Many-to-Many 134


Key Constraints
since
name dname
• An employee can work in
cin lot did budget
many departments; a dept
can have many employees.
• In contrast, each dept has at Employees Works_In Departments

most one manager,


according to the key Manages
constraint on Department in
the Manages relationship.

1-to Many Many-to-1 Many-to-Many 1-to-1 135


Participation Constraints
• Does every department have a manager?
• If so, this is a participation constraint:
• the participation of Employees in Works_In said to be total (vs. partial).
• Means at least one

since
name dname
cin lot did budget

Employees Manages Departments

Works_In

since
136
Participation Constraints
• Does every department have a manager?
• If so, this is a participation constraint:
• the participation of Employees in Works_In said to be total (vs. partial).
• Means at least one

since
name dname
cin lot did budget

Employees Manages Departments

Works_In

since
137
Participation Constraints
• Does every department have a manager?
• If so, this is a participation constraint:
• the participation of Employees in Works_In said to be total (vs. partial).
• Means at least one

since
name dname
cin lot did budget

Employees Manages Departments

Works_In

since
138
Weak Entities
• A weak entity can be identified uniquely only by considering the primary key of
another (owner) entity.
• Owner entity set and weak entity set must participate in a one-to-many relationship set
(one owner, many weak entities).
• Weak entity set must have total participation in this identifying relationship set.
• Weak entities have a partial key (dashed line)

name
cost pname age
cin lot

Employees Policy Dependents

139
ISA (`is a’) Hierarchies
❖ As in C++, or other PLs, attributes are inherited.
❖ If we declare A ISA B, every A entity is also
considered to be a B entity.
❖ Overlap constraints: Can Joe be an Hourly_Emps as
well as a Contract_Emps entity? name
(Allowed/disallowed) cin lot
❖ Covering constraints: Does every Employees entity
also have to be an Hourly_Emps or a Contract_Emps Employees
entity? (Yes/no)
❖ Reasons for using ISA:
hourly_wages hours_worked
• To add descriptive attributes specific to a ISA
subclass. contractid
• To identify entitities that participate in a
relationship. Hourly_Emps Contract_Emps

140
Binary vs. Ternary Relationships
name
cin lot pname age

Employees Covers Dependents

Policies

policyid cost

141
Binary vs. Ternary Relationships
name
cin lot pname age

Employees Covers Dependents

Policies

policyid cost

142
Binary vs. Ternary Relationships
name
cin lot pname age

Employees Covers Dependents

Policies

policyid cost
name pname age
cin lot
Dependents
Employees

Purchaser
Beneficiary

Better design
Policies

policyid cost 143


Binary vs. Ternary Relationships
date
pname
did dname daddr
pid ptype

Product Concession Dealer

Country

cid cname

144
Binary vs. Ternary Relationships (Contd.)

• Previous example illustrated a case when two binary relationships


were better than one ternary relationship.
• An example in the other direction: a ternary relation Contracts
relates entity sets Parts, Departments and Suppliers, and has
descriptive attribute qty. No combination of binary relationships is
an adequate substitute:
• S “can-supply” P, D “needs” P, and D “deals-with” S does not imply that
D has agreed to buy P from S.
• How do we record qty?

145
in sql it means grouping entities according to a condition

Aggregation ex grouping by the city of birth.


In modeling, aggregation refers to a concept used to
represent relationships between objects or entities in a
system, emphasizing a whole-part relationship.
cin
name
lot

Employees
• Used when we have to
model a relationship
involving (entity sets Monitors until
and) a relationship set.
• Aggregation allows us to
started_on since
treat a relationship set as dname
an entity set for pid pbudget did budget
purposes of participation
in (other) relationships. Projects Sponsors Departments

* Aggregation vs. ternary relationship:


❖ Monitors is a distinct relationship,
with a descriptive attribute.
❖ Also, can say that each sponsorship
is monitored by at most one employee. 146
Conceptual Design Using the ER Model

• Design choices:
• Should a concept be modeled as an entity or an attribute?
• Should a concept be modeled as an entity or a relationship?
• Identifying relationships: Binary or ternary? Aggregation?
• Constraints in the ER Model:
• A lot of data semantics can (and should) be captured.
• But some constraints cannot be captured in ER diagrams.

147
Entity vs. Attribute

• Should address be an attribute of Employees or an entity


(connected to Employees by a relationship)?
• Depends upon the use we want to make of address information,
and the semantics of the data:
• If we have several addresses per employee, address must be an entity
(since attributes cannot be set-valued).
• If the structure (city, street, etc.) is important, e.g., we want to retrieve
employees in a given city, address must be modeled as an entity
(since attribute values are atomic).

148
Entity vs. Attribute (Contd.)
• Works_In2 does not allow
from to
an employee to work in a name dname
department for two or more cin lot did budget
periods.
• Similar to the problem of Employees Works_In2 Departments

wanting to record several


addresses for an employee:
We want to record several
values of the descriptive name dname
attributes for each instance cin lot did budget
of this relationship.
Accomplished by introducing Employees Works_In2 Departments
new entity set, Duration.
from Duration to

149
Entity vs. Relationship
• First ER diagram OK if a since dbudget
manager gets a separate name dname
cin lot did budget
discretionary budget for each
dept. Employees Departments
Manages2
• What if a manager gets a
discretionary budget that covers
all managed depts?
• Redundancy: dbudget stored for
each dept managed by manager.
• Misleading: Suggests dbudget
associated with department-mgr
combination.

150
Entity vs. Relationship
• First ER diagram OK if a since dbudget
manager gets a separate name dname
cin lot did budget
discretionary budget for each
dept. Employees Departments
Manages2
• What if a manager gets a
discretionary budget that covers
all managed depts?
• Redundancy: dbudget stored for
each dept managed by manager.
• Misleading: Suggests dbudget
associated with department-mgr
combination.

151
Entity vs. Relationship
• First ER diagram OK if a since dbudget
manager gets a separate name dname
cin lot did budget
discretionary budget for each
dept. Employees Departments
Manages2
• What if a manager gets a
discretionary budget that covers name
all managed depts? cin lot
• Redundancy: dbudget stored for dname
each dept managed by manager. Employees did budget

• Misleading: Suggests dbudget Departments


associated with department-mgr managed_by
is_manager
combination.

apptnum This fixes the


Mgr_Appts dbudget problem!
152
Summary of Conceptual Design
• Conceptual design follows requirements analysis,
• Yields a high-level description of data to be stored
• ER model popular for conceptual design
• Constructs are expressive, close to the way people think about their
applications.
• Basic constructs: entities, relationships, and attributes (of entities
and relationships).
• Some additional constructs: weak entities, ISA hierarchies, and
aggregation.
• Note: There are many variations on ER model.

153
Summary of ER (Contd.)

• Several kinds of integrity constraints can be expressed in the ER


model: key constraints, participation constraints, and
overlap/covering constraints for ISA hierarchies. Some foreign key
constraints are also implicit in the definition of a relationship set.
• Some constraints (notably, functional dependencies) cannot be expressed
in the ER model.
• Constraints play an important role in determining the best database design
for an enterprise.

154
Summary of ER (Contd.)

• ER design is subjective. There are often many ways to model a


given scenario! Analyzing alternatives can be tricky, especially for
a large enterprise. Common choices include:
• Entity vs. attribute, entity vs. relationship, binary or n-ary relationship,
whether or not to use ISA hierarchies, and whether or not to use
aggregation.
• Ensuring good database design: resulting relational schema
should be analyzed and refined further. FD information and
normalization techniques are especially useful.

155
Different ER Notations
• There exist different types of notations
• Arrow notation
name
cost pname age
cin lot

Employees Policy Dependents

156
Different ER Notations
• There exist different types of notations
• Arrow notation
• Chen notation name
cost pname
• UML notation cin lot age

• Barker notation
• IDEF1X notation Employees Policy Dependents
• Crow’s Foot notation

157
Different ER Notations
name
cost pname age
cin lot

Employees Policy Dependents

158
Different ER Notations
name
cost pname age
cin lot

Employees Policy Dependents

Note that the constraints are flipped


Symbol means 0-Many. An employee159
can manage 0 or more departments
Databases
Logical Design (The Relational Model)
Chapter 3

160
Relational Database: Definitions

• Relational database: a set of relations


• Relation: made up of 2 parts:
• Instance : a table, with rows and columns.
#Rows = cardinality, #fields = degree / arity.
• Schema : specifies name of relation, plus name and type of each column.
• E.G.
• Employees (cin: string, name: string, lot: integer)

• Can think of a relation as a set of rows or tuples (i.e., all rows are
distinct).

161
Example Instance of Employees Relation

cin name lot


B1234 Mohamed 1
C5678 Amina 2
D9012 Imane 3
E3456 Mohamed 4

❖ Cardinality = 4, degree = 3, all rows distinct


❖ Do all columns in a relation instance have to
be distinct?
162
Converting ER to Relational Model

• Tools available to do this automatically-but always double check!


• But, let’s learn the basics cin name lot
• Converting entities B1234 Mohamed 1
C5678 Amina 2
name D9012 Imane 3
cin lot
E3456 Mohamed 4

Employees CREATE TABLE Employees


(
cin CHAR(12),
name VARCHAR(100),
lot INT,
PRIMARY KEY (cin)
); 163
Converting ER to Relational Model

• Tools available to do this automatically-but always double check!


• But, let’s learn the basics
• Converting relationships: M-M
CREATE TABLE Works_In
since (
name dname since DATE,
cin lot did budget cin CHAR(12),
did INT,
PRIMARY KEY (cin, did),
Employees Works_In Departments FOREIGN KEY (cin) REFERENCES Employees(cin),
FOREIGN KEY (did) REFERENCES Departments(did)
);

164
Recall

1-to Many Many-to-1 Many-to-1 Many-to-Many 1-to-1

165
Converting ER to Relational Model
CREATE TABLE Manages
• Tools available to do this automatically (
cin CHAR(12),
• But, let’s learn the basics did INT,
since DATE,
• Converting relationships: 1-M PRIMARY KEY (did),
FOREIGN KEY (cin) REFERENCES Employees (cin),
since FOREIGN KEY (did) REFERENCES Departments(did)
name dname );
cin lot did budget

Employees Manages Departments

166
Converting ER to Relational Model
CREATE TABLE Manages
• Tools available to do this automatically (
cin CHAR(12),
• But, let’s learn the basics did INT,
since DATE,
• Converting relationships: 1-M PRIMARY KEY (did),
FOREIGN KEY (cin) REFERENCES Employees (cin),
since FOREIGN KEY (did) REFERENCES Departments(did)
name dname );
cin lot did budget

CREATE TABLE Dept_Mgr


Employees Manages Departments
(
did INT,
dname VARCHAR(50),
budget DATE,
cin CHAR(12),
PRIMARY KEY (did),
FOREIGN KEY (cin) REFERENCES Employees(cin)
167
);
Converting ER to Relational Model

• Tools available to do this automatically


• But, let’s learn the basics
• Converting relationships: 1-M

since
name dname
cin lot did budget
CREATE TABLE Dept_Mgr
Employees Manages Departments (
did INT,
dname VARCHAR(50),
Works_In budget FLOAT,
cin CHAR(12) NOT NULL,
PRIMARY KEY (did),
since FOREIGN KEY (cin) REFERENCES Employees(cin)
168
);
Converting ER to Relational Model

• Tools available to do this automatically


• But, let’s learn the basics
• Converting relationships: 1-M

since
name dname
cin lot did budget
CREATE TABLE Dept_Mgr
Employees Manages Departments (
did INT,
dname VARCHAR(50),
Works_In budget FLOAT,
cin CHAR(12) NOT NULL,
PRIMARY KEY (did),
since FOREIGN KEY (cin) REFERENCES Employees(cin)
169
ON DELETE NO ACTION);
Converting ER to Relational Model

• Tools available to do this automatically


• But, let’s learn the basics
• Weak Entities name
cost pname age
cin lot

CREATE TABLE Dep_Policy Employees Policy Dependents


(
pname VARCHAR(50),
age FLOAT,
cost FLOAT,
cin CHAR(12) NOT NULL,
PRIMARY KEY (pname,cin),
FOREIGN KEY (cin) REFERENCES Employees(cin)
ON DELETE CASCADE
);
Relational Query Languages

• A major strength of the relational model: supports simple, powerful


querying of data.
• Queries can be written intuitively, and the DBMS is responsible for
efficient evaluation.
• The key: precise semantics for relational queries.
• Allows the optimizer to extensively re-order operations, and still ensure
that the answer does not change.

171
The SQL Query Language

• Developed by IBM (system R) in the 1970s


• Need for a standard since it is used by many vendors
• Standards:
• SQL-86
• SQL-89 (minor revision)
• SQL-92 (major revision)
• SQL-99 (major extensions, current standard)

172
The SQL Query Language

• To find all employees with name “Mohamed” :

SELECT * cin name lot


FROM Employees E B1234 Mohamed 1
WHERE E.name=“Mohamed”
E3456 Mohamed 4

•To find just names and cin, replace the first line:

SELECT E.cin, E.name,

173
Querying Multiple Relations
• Given the following instances of Employees and Works_In

cin name lot cin since did


B1234 Mohamed 1 B1234 01/02/2022 11
C5678 Amina 2 C5678 10/08/2020 22
D9012 Imane 3 D9012 15/03/2022 33
E3456 Mohamed 4 E3456 05/06/2009 44

• What does the following query compute?


SELECT E.lot, W.did
FROM Employees E, Works_In W
WHERE E.cin=W.cin AND E.name=“Mohamed”

174
Querying Multiple Relations
• Given the following instances of Employees and Works_In

cin name lot cin since did


B1234 Mohamed 1 B1234 01/02/2022 11
C5678 Amina 2 C5678 10/08/2020 22
D9012 Imane 3 D9012 15/03/2022 33
E3456 Mohamed 4 E3456 05/06/2009 44

• What does the following query compute?


lot did
SELECT E.lot, W.did
1 11
FROM Employees E, Works_In W
WHERE E.cin=W.cin AND E.name=“Mohamed” 4 44

175
Destroying and Altering Relations

• Destroys the relation DROP TABLE Employees


Employees. The schema
information and the tuples are
deleted. (check constraints!!)
• The schema of Students is ALTER TABLE Students
altered by adding a new field; ADD COLUMN firstYear: integer
every tuple in the current
instance is extended with a
null value in the new field.

176
Destroying and Altering Relations

• Destroys the relation DROP TABLE Employees


Employees. The schema
information and the tuples are
deleted. (check constraints!!)
• The schema of Employees is ALTER TABLE Employees
altered by adding a new field; ADD COLUMN dob: DATE
every tuple in the current
instance is extended with a
null value in the new field.

177
Adding and Deleting Tuples

• Can insert a single tuple using:

INSERT INTO Employees (cin, name, lot, dob)


VALUES (‘F7890’, ‘Ahmed’, 5, ’01/20/1980’)

❖ Can delete all tuples satisfying some


condition (e.g., name = Smith):
DELETE
FROM Employees E
WHERE E.name = ‘Ahmed’

* Powerful variants of these commands are available; more later!


178
Integrity Constraints (ICs)

• IC: condition that must be true for any instance of the database;
e.g., domain constraints.
• ICs are specified when schema is defined.
• ICs are checked when relations are modified.
• A legal instance of a relation is one that satisfies all specified ICs.
• DBMS should not allow illegal instances.
• If the DBMS checks ICs, stored data is more faithful to real-world
meaning. Integrity Constraints are rules or conditions that are defined when creating a database
schema to ensure data accuracy, consistency, and adherence to real-world constraints.
The DBMS plays a crucial role in enforcing these constraints to maintain the integrity
• Avoids data entry errors, too! and quality of the data stored in the database. This, in turn, helps prevent data entry
errors and ensures that the data accurately represents the real-world domain it is meant
to capture.

179
A primary key constraint is a type of integrity
constraint used in relational databases to ensure the
Primary Key Constraints uniqueness and reliability of data. It is a
fundamental concept in database management

• A set of fields is a key for a relation if :


1. No two distinct tuples can have same values in all key fields, and A superkey is a set of one or more attributes (columns) that can be used to uniquely identify
records (rows) within a table.

2. This is not true for any subset of the key. It may contain more attributes than necessary to uniquely identify records, making it a superset
of a candidate key.
Superkeys can have attributes that are not strictly required for uniqueness.

• Part 2 false? A superkey. we have this key as an extra and it's not minimum like when we have cin first
name and last name , the two last ones are considered as superkey

• If there’s >1 key for a relation, one of the keys is chosen (by DBA) to be
the primary key. when we have two keys like massar id and cin we can chose one to be a primary key
• E.g., cin is a key for Employees. (What about name?) The set
{cin, lot} is a superkey.
Foreign Key:

A foreign key is an attribute or set of attributes in one table that refers to the primary key in another table.
It establishes relationships between tables and enforces referential integrity by ensuring that values in the foreign key match values in
the primary key of the referenced table

180
Where do ICs Come From?

• ICs are based upon the semantics of the real-world enterprise that
is being described in the database relations.
• We can check a database instance to see if an IC is violated, but
we can NEVER infer that an IC is true by looking at an instance.
• An IC is a statement about all possible instances!
• From example, we know name is not a key, but the assertion that sid is a
key is given to us.
• Key and foreign key ICs are the most common; more general ICs
supported too.

181
Relational Model: Summary
• A tabular representation of data.
• Simple and intuitive, currently the most widely used.
• Integrity constraints can be specified by the DBA, based on application
semantics. DBMS checks for violations.
• Two important ICs: primary and foreign keys
• In addition, we always have domain constraints.
• Powerful and natural query languages exist.
• Rules to translate ER to relational model
• We will revisit more in detail later in the course

182

You might also like