0% found this document useful (0 votes)
44 views44 pages

Data Management and Database Design: INFO 6210 Week #4

This document discusses database normalization and design. It provides an overview of different normalization forms including 1st, 2nd, 3rd normal forms as well as Boyce-Codd normal form. It also covers topics like functional dependencies, transitive dependencies, and relational algebra. Examples are given to illustrate concepts like removing repeating groups and separating partially dependent fields into new tables to achieve higher normal forms.

Uploaded by

test
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views44 pages

Data Management and Database Design: INFO 6210 Week #4

This document discusses database normalization and design. It provides an overview of different normalization forms including 1st, 2nd, 3rd normal forms as well as Boyce-Codd normal form. It also covers topics like functional dependencies, transitive dependencies, and relational algebra. Examples are given to illustrate concepts like removing repeating groups and separating partially dependent fields into new tables to achieve higher normal forms.

Uploaded by

test
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Data Management

and
Database Design
INFO 6210
Week #4

Northeastern University
This Week

• Different types Dependencies


• Normalization
• Relational Algebra
• Intro to SQL

2
Exercise
• Requirement
• Basketball league has many divisions
• Division has many players
• Division has many teams

Design the model. Simple task, envision 3 entities


with sample attributes

3
Will below solution works?

4
Solution

5
Normalization

6
Normalization

• Normalization is a –
• Process of refining data model build by ER diagram
• Design technique that begins by examining Relationships between attributes

7
Normalization

• Normalization is a –
• Process of refining data model build by ER diagram
• Design technique that begins by examining Relationships between attributes

• Ensures minimum redundancy of data


• Easier for users to access data
• Easy to maintain
• Takes minimum space to store data
• Reduces anomalies
• Insert, Update and Delete should affect only desired ROW

8
ERD – ER Diagram

• What is ER Diagram (ERD)?


• Model and design relational databases, in terms of logic and business rules
• This is the first step that happens for database dependent applications
• Helps to troubleshoot existing problems

• Components of ERD –
• Entity – A definable object such as customer, student, product and shown as Rectangle
• Relationship – How entities connect each other and show as diamond
• Attribute – Characteristics of an Entity and show as oval or sometimes circle.

• Limitations –
• Only for Relational DBs (Structured)
• Cannot be applied on unstructured

9
ERD – ER Diagram

• Tools –
• Erwin
• Toad Data modeler
• SQL Data modeler
• Navicat data modeler
• E/R Studio
• Generating
• Design based on code is Reverse Engineering
• Code based on Design is Forward Engineering

10
Dependencies

• Functional
• Transitive
• Partial

11
Functional Dependencies

Important concept associated with Normalization is Functional Dependencies

• The attribute B is fully functionally dependent on the attribute A, if each value of A determines
one and only one value of B

• Example: Product_ID à Product_Name


• Product ID determines Product Name
• Product Name is Dependent Attribute
• Product ID is Determinant

• We can also say Product_ID functionally determines Product_Name

12
Functional Dependencies

• Another Example

13
Partial Dependency

• A functional dependency A à B is called partial dependency if –


• There is a attribute that can be removed from A and yet the dependency still holds

Partial Dependency is similar to functional dependency when non prime-key attribute(s) are functionally dependent
on a part of the Candidate key.
Student_ID Student_fName Student_lname Course_ID Course_name PK-Part1
S001 John Smith C8899 Artificial Intelligence PK-Part2
S002 Brad Smith C8899 Artificial Intelligence Primary key
Student ID + Course ID is unique – Primary key
Course Name is dependent on Course ID and doesn’t required Student ID
To resolve this, we create a parent table called course and reference to student.

14
Transitive Dependency

• An indirect relationship between data elements


Model Manufacturer Contact#
• A, B, and C are attributes of a relation such that if
• A à B and B à C, then C is transitively dependent on A via B Camry Toyota 9819876677
• If we know Model, we can get Manufacturer, Using Manufacturer Corella Toyota 9819876677
we can get phone number Accord Honda 8889997766
• Model –> contact is transitive dependent
Civic Honda 8889997766
Acura Honda 8889997766
• Problems if we have Transitive dependency –
Kia Hyundai 8001112222
• If we delete Kia and Sonata models, then Manufacturer and
his contact is removed Sonata Hyundai 8001112222
• Cannot add Manufacturer without a Model
• Adding additional attributes of Manufacture adds repetition
• Update leads more than one row modification

15
Normalization
1st Normal Form

2nd Normal Form

3rd Normal Form

16
Normalization
1st Normal Form

2nd Normal Form

3rd Normal Form

17
Boyce-Codd normal form (BCNF)

Fourth normal form (4NF)


Database Normalization by E.F CODD

Every Non-Prime attribute of relation R is –


non-Transitively dependent which means directly dependent on
every candidate key of R.

18
First Normal form – 1NF

• Steps –
• Every column should have only one value
• Combination of row and column should not have more than one value

• Identify repeating groups of fields


• Remove repeating groups to separate table

• Identify keys for the tables


• Goal is to have a Unique key

• Key of parent table is brought as part of concatenated key to the second table

19
First Normal form – 1NF

Company
ID Company Name City Company Contact
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott, Martin
3 ACME Inc. Frisco John, Drew, Sam, Helen

20
First Normal form – 1NF

Company
ID CompanyName City Company Contact
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott, Martin
3 ACME Inc. Frisco John, Drew, Sam, Helen

Company
ID Name City CContact1 CContact2 CContact3 CContact4
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott Martin
3 ACME Inc Frisco John Drew Sam Hellen

21
First Normal form – 1NF

Company
ID CompanyName City Company_Contact
1 ABC Corp Denver ID Company_ID Contact_Name
2 True Tree Org Burbank 1 1 Mike
3 ACME Inc. Frisco 2 2 Scott
3 2 Martin
4 3 Drew
5 3 Sam
6 3 Hellen

22
Raw Data

Assumption –
Client rents a given property
only once and cannot rent
more than one property at a
time

23
Second Normal form – 2NF

• Steps –
• Check if all fields are dependent on the whole key (Entire primary key)
• Remove fields that depend on part of the key
• Group partially dependent fields as a sequence table
• Name the tables
• Identify keys to the tables

24
Second Normal form – 2NF

Course_Enrollments
ID Date Course Name # of Students Hall No
DW987 1/1/1989 DW Systems 5 9
DW987 5/1/1989 DW Systems 5 7
DBMS433 8/1/1992 Data science 9 11

25
Second Normal form – 2NF

Course
Enrollments
ID Course Name
DW987 DW Systems ID CID Date # of Hall No
Students
DBMS433 Data science
1 DW987 1/1/1989 5 9
2 DW987 5/1/1989 5 7
3 DBMS433 8/1/1992 9 11

26
Third Normal form – 3NF

• Remove Fields that –


• Depend on other non-key fields (Transitive dependency)
• Can be calculated or derived

• Group interdependent fields as separate tables.


• Identify such tables and name them with appropriate key.

27
Third Normal form – 3NF

Course_Enrollments
ID CID Date Available Hall No Capacity
1 DW987 1/1/1989 5 9 12
2 DW987 5/1/1989 8 7 10
3 DBMS433 8/1/1992 3 11 15

28
Third Normal form – 3NF

Hall_Details
Hall No Capacity
9 12
7 10
11 15

Course_Enrollments
ID CID Date Available Hall No
1 DW987 1/1/1989 5 9
2 DW987 5/1/1989 8 7
3 DBMS433 8/1/1992 3 11

29
Checkpoint 1

Which of the following is Wrong?


A. An attribute of an entity can have more than one value
B. In a row of a relational table, an attribute can have more than one value
C. In a row of a relational table, an attribute can have exactly one value or a NULL value

30
Recap
• Properties of a relation?
• Has a name which is unique in its schema
• Combination of row and column should have only one
value
• Every attribute in a relation should have unique names
• Every row is unique

31
Recap

• Natural Key – Identifier used to uniquely identify data


• Super Key – 1 or more Column groups which can uniquely identify row. Eg: A tuple is always a super key
• Candidate Key – Least number of columns that can be used to uniquely identify a row (Subset of super key)
• Primary Key – Candidate key is eligible to become a primary key

• Functional Dependency – Typically between PK and non Key columns – Help to reduce redundancy
• Partial Dependency – There is an attribute that depends on part of primary key
• Transitive Dependency – A à B, B à C, then C is transitively dependent on A.

• First Normal form – No multivalued attribute, Entries in a column are of same type.
• Second Normal form – Should be in 1NF and No partial dependency
• Third Normal form – should be in 2NF and No Transitive dependency

32
Recap

• When you normalize a relation by breaking it into two smaller relations, what must you
do to maintain data integrity? (Select all that apply)

• A. Link the relations by a common field


• B. Remove any functional dependencies from both relations
• C. Assign both relations the same primary key field(s)
• D. Create a primary key(s) for the new relation

33
Introduction to SQL

• Language used to access Data


• Key features
ü Non Procedural language
ü Unified language
ü Common language for all Relational Databases

• DDL – Create objects like Tables, Views, Indexes


• DML – Modify data
• DCL – Controlling data and access

34
Datatypes

• CHAR
• VARCHAR (VARCHAR2)
• NVARCHAR2
• NUMBER
• DATE
• BLOB
• CLOB
• RAW

35
Syntax - SQL

SELECT [COLUMN NAMES]


FROM [TABLE NAMES]
WHERE [JOIN CONDITIONS]
GROUP BY [COLUMN NAMES]
ORDER BY [COLUMN NAMES]

36
What’s in next week?

• Relational Algebra
• SQL continues…

37
Relational Algebra
38
Relational Algebra

• It is a procedural query language –


• which takes instance of relation as input to yields instance of relation as output.
• It uses operators to perform queries

• So, Any operation that we perform on Relations produces another Relation

• Relation with Operator gives Relations

39
Relational Algebra

• It is a procedural query language –


• which takes instance of relation as input to yields instance of relation as output.
• It uses operators to perform queries

• So, Any operation that we perform on Relations produces another Relation

• Relation with Operator gives Relations

• What is Unary and binary operator(s)?

40
Unary and Binary Operators

• An operator that expects more than one input is Binary operator


• Example: Addition, AND, Subtraction, Division, Multiplication

• Unary
• Factorial (!)
• ++
• -- (2 minus symbols)
• Binary
• ==
• <
• >
• !=

41
Operand and Operators
• Operators – - Operator act on Operand
•+
•- Unary

•/ Operator Operand

• Operand can be –
• Numbers
• Words Binary
Operand

Operator
• Mathematical operations on numbers
Operand
• Substring, concatenation on strings
42
Relational Algebra
Relational Algebra is operational , Simple to use
and can be used to show execution plans

Basic Operations are –

- Unary operators - Binary operators


- Selection (σ) - Cross product (X)
- Projection (π) - Difference (–)
- Rename (ρ) - Union (U)
- Intersection (∩)

43
Questions?

44

You might also like