Data Management and Database Design: INFO 6210 Week #4
Data Management and Database Design: INFO 6210 Week #4
and
Database Design
INFO 6210
Week #4
Northeastern University
This Week
2
Exercise
• Requirement
• Basketball league has many divisions
• Division has many players
• Division has many teams
3
Will below solution works?
4
Solution
5
Normalization
6
Normalization
• Normalization is a –
• Process of refining data model build by ER diagram
• Design technique that begins by examining Relationships between attributes
7
Normalization
• Normalization is a –
• Process of refining data model build by ER diagram
• Design technique that begins by examining Relationships between attributes
8
ERD – ER Diagram
• Components of ERD –
• Entity – A definable object such as customer, student, product and shown as Rectangle
• Relationship – How entities connect each other and show as diamond
• Attribute – Characteristics of an Entity and show as oval or sometimes circle.
• Limitations –
• Only for Relational DBs (Structured)
• Cannot be applied on unstructured
9
ERD – ER Diagram
• Tools –
• Erwin
• Toad Data modeler
• SQL Data modeler
• Navicat data modeler
• E/R Studio
• Generating
• Design based on code is Reverse Engineering
• Code based on Design is Forward Engineering
10
Dependencies
• Functional
• Transitive
• Partial
11
Functional Dependencies
• The attribute B is fully functionally dependent on the attribute A, if each value of A determines
one and only one value of B
12
Functional Dependencies
• Another Example
13
Partial Dependency
Partial Dependency is similar to functional dependency when non prime-key attribute(s) are functionally dependent
on a part of the Candidate key.
Student_ID Student_fName Student_lname Course_ID Course_name PK-Part1
S001 John Smith C8899 Artificial Intelligence PK-Part2
S002 Brad Smith C8899 Artificial Intelligence Primary key
Student ID + Course ID is unique – Primary key
Course Name is dependent on Course ID and doesn’t required Student ID
To resolve this, we create a parent table called course and reference to student.
14
Transitive Dependency
15
Normalization
1st Normal Form
16
Normalization
1st Normal Form
17
Boyce-Codd normal form (BCNF)
18
First Normal form – 1NF
• Steps –
• Every column should have only one value
• Combination of row and column should not have more than one value
• Key of parent table is brought as part of concatenated key to the second table
19
First Normal form – 1NF
Company
ID Company Name City Company Contact
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott, Martin
3 ACME Inc. Frisco John, Drew, Sam, Helen
20
First Normal form – 1NF
Company
ID CompanyName City Company Contact
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott, Martin
3 ACME Inc. Frisco John, Drew, Sam, Helen
Company
ID Name City CContact1 CContact2 CContact3 CContact4
1 ABC Corp Denver Mike
2 True Tree Org Burbank Scott Martin
3 ACME Inc Frisco John Drew Sam Hellen
21
First Normal form – 1NF
Company
ID CompanyName City Company_Contact
1 ABC Corp Denver ID Company_ID Contact_Name
2 True Tree Org Burbank 1 1 Mike
3 ACME Inc. Frisco 2 2 Scott
3 2 Martin
4 3 Drew
5 3 Sam
6 3 Hellen
22
Raw Data
Assumption –
Client rents a given property
only once and cannot rent
more than one property at a
time
23
Second Normal form – 2NF
• Steps –
• Check if all fields are dependent on the whole key (Entire primary key)
• Remove fields that depend on part of the key
• Group partially dependent fields as a sequence table
• Name the tables
• Identify keys to the tables
24
Second Normal form – 2NF
Course_Enrollments
ID Date Course Name # of Students Hall No
DW987 1/1/1989 DW Systems 5 9
DW987 5/1/1989 DW Systems 5 7
DBMS433 8/1/1992 Data science 9 11
25
Second Normal form – 2NF
Course
Enrollments
ID Course Name
DW987 DW Systems ID CID Date # of Hall No
Students
DBMS433 Data science
1 DW987 1/1/1989 5 9
2 DW987 5/1/1989 5 7
3 DBMS433 8/1/1992 9 11
26
Third Normal form – 3NF
27
Third Normal form – 3NF
Course_Enrollments
ID CID Date Available Hall No Capacity
1 DW987 1/1/1989 5 9 12
2 DW987 5/1/1989 8 7 10
3 DBMS433 8/1/1992 3 11 15
28
Third Normal form – 3NF
Hall_Details
Hall No Capacity
9 12
7 10
11 15
Course_Enrollments
ID CID Date Available Hall No
1 DW987 1/1/1989 5 9
2 DW987 5/1/1989 8 7
3 DBMS433 8/1/1992 3 11
29
Checkpoint 1
30
Recap
• Properties of a relation?
• Has a name which is unique in its schema
• Combination of row and column should have only one
value
• Every attribute in a relation should have unique names
• Every row is unique
31
Recap
• Functional Dependency – Typically between PK and non Key columns – Help to reduce redundancy
• Partial Dependency – There is an attribute that depends on part of primary key
• Transitive Dependency – A à B, B à C, then C is transitively dependent on A.
• First Normal form – No multivalued attribute, Entries in a column are of same type.
• Second Normal form – Should be in 1NF and No partial dependency
• Third Normal form – should be in 2NF and No Transitive dependency
32
Recap
• When you normalize a relation by breaking it into two smaller relations, what must you
do to maintain data integrity? (Select all that apply)
33
Introduction to SQL
34
Datatypes
• CHAR
• VARCHAR (VARCHAR2)
• NVARCHAR2
• NUMBER
• DATE
• BLOB
• CLOB
• RAW
35
Syntax - SQL
36
What’s in next week?
• Relational Algebra
• SQL continues…
37
Relational Algebra
38
Relational Algebra
39
Relational Algebra
40
Unary and Binary Operators
• Unary
• Factorial (!)
• ++
• -- (2 minus symbols)
• Binary
• ==
• <
• >
• !=
41
Operand and Operators
• Operators – - Operator act on Operand
•+
•- Unary
•/ Operator Operand
• Operand can be –
• Numbers
• Words Binary
Operand
Operator
• Mathematical operations on numbers
Operand
• Substring, concatenation on strings
42
Relational Algebra
Relational Algebra is operational , Simple to use
and can be used to show execution plans
43
Questions?
44