0% found this document useful (0 votes)
2 views

Module 2

Module 2 covers data models and normalization, focusing on the Entity-Relationship (ER) model, which organizes data into entities, attributes, and relationships. It explains various ER diagram notations and introduces the relational model, emphasizing the importance of tables, keys, and relationships for effective database design. The module also details the normalization process, outlining the steps to reduce redundancy and improve data integrity through the application of normal forms.

Uploaded by

patobaby651
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 2

Module 2 covers data models and normalization, focusing on the Entity-Relationship (ER) model, which organizes data into entities, attributes, and relationships. It explains various ER diagram notations and introduces the relational model, emphasizing the importance of tables, keys, and relationships for effective database design. The module also details the normalization process, outlining the steps to reduce redundancy and improve data integrity through the application of normal forms.

Uploaded by

patobaby651
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Module 2: Data Models and Normalization

Data Models: A Flashback

Data models are fundamental to database design. They provide a structured way
to represent the real-world entities, their attributes, and relationships, which will
be stored in the database. Think of it as a blueprint for your database.

There are several types of data models, each with its own strengths and
weaknesses. We'll focus on the Entity-Relationship (ER) model, a widely used
conceptual model.

The Entity-Relationship (ER) Model

The ER model helps us understand and organize data by breaking it down into
three key components:

• Entities: These are the "things" we want to store information about. Think
of them as nouns. Examples include:

o Students: In a university database

o Books: In a library catalog

o Customers: In an e-commerce platform

o Products: In an inventory system

• Attributes: These are the properties or characteristics that describe an


entity. Think of them as adjectives. Examples include:

o Student: StudentID, Name, Major, GPA

o Book: ISBN, Title, Author, Publication Year

o Customer: CustomerID, Name, Address, Email

o Product: ProductID, Name, Price, Description


• Relationships: These describe how entities are associated with each other.
They are the verbs that connect the nouns. Examples include:

o Students enroll in Courses

o Instructors teach Courses

o Customers place Orders

o Orders contain Products

ER Diagram Notations: Chen, Crow's Foot, and UML

While the core concepts of the Entity-Relationship (ER) model remain consistent,
there are different notations used to represent these concepts visually in ER
diagrams. Three popular notations are Chen, Crow's Foot, and UML.

1. Chen Notation

• Developed by: Peter Chen in 1976 (one of the pioneers of the ER model).

• Focus: Emphasizes the semantic aspects of entities and relationships.

• Symbols:

o Rectangles: Represent entities.

o Diamonds: Represent relationships, with the relationship name


written inside the diamond.

o Ovals: Represent attributes.

o Lines: Connect entities to attributes and relationships.

o Cardinality: Uses 1 for one and only one, and 'm' or 'n' for many.

2. Crow's Foot Notation


• Origin: Evolved from the network data model and gained popularity due
to its more visually intuitive representation of cardinality.

• Focus: Provides a clear visual representation of cardinality and modality.

• Symbols:

o Rectangles: Represent entities.

o Lines: Represent relationships.

o Circles: Represent optionality (zero or one).

o Short vertical bars: Represent mandatory (one or more).

o "Crow's foot" symbols: Represent "many" with three prongs.

3. UML (Unified Modeling Language) Notation

• Purpose: A general-purpose modeling language used for various software


engineering tasks, including database design.

• Focus: Provides a standardized notation for object-oriented modeling.

• Symbols:

o Rectangles: Represent classes (similar to entities).

o Lines: Represent associations (similar to relationships).

o Numbers and asterisks: Indicate multiplicity (cardinality).

o Arrows: Can indicate navigability (direction of the relationship).

ER Diagrams: Visualizing the Model

ER diagrams provide a visual representation of the ER model, making it easier to


understand the structure of the data. Here's a breakdown of the standard notation:
• Rectangles: Represent entities (e.g., Student, Course)

• Ovals: Represent attributes (e.g., StudentID, CourseName)

• Diamonds: Represent relationships (e.g., enrolls in, teaches)

• Lines: Connect entities to their attributes and relationships

Example ER Diagram:

Imagine a simplified university database. Here's how an ER diagram might look:

[asy] draw((0,2)--(4,2)--(4,0)--(0,0)--cycle); label("Student",(2,1));


draw((1,2.5)..(2,3)..(3,2.5)); label("StudentID",(2,3.2));
draw((1,1.5)..(2,1)..(3,1.5)); label("Name",(2,1.2)); draw((1,0.5)..(2,0)..(3,0.5));
label("Major",(2,0.2));

draw((8,2)--(12,2)--(12,0)--(8,0)--cycle); label("Course",(10,1));
draw((9,2.5)..(10,3)..(11,2.5)); label("CourseID",(10,3.2));
draw((9,1.5)..(10,1)..(11,1.5)); label("CourseName",(10,1.2));
draw((9,0.5)..(10,0)..(11,0.5)); label("Credits",(10,0.2));

draw((4.5,1)--(7.5,1)); label("enrolls in",(6,1.3),N); draw((6,1.7)--(6,1.3)--


(6,0.7)); [/asy]

This diagram shows two entities, "Student" and "Course," with their respective
attributes. The diamond "enrolls in" represents the relationship between them.

Cardinality and Modality

ER diagrams also express the cardinality and modality of relationships:

• Cardinality: Indicates the number of instances of one entity that can be


associated with instances of another entity. Common cardinalities include:
o One-to-one (1:1): One instance of entity A is associated with only
one instance of entity B, and vice versa (e.g., one employee has one
social security number).

o One-to-many (1:M): One instance of entity A can be associated


with multiple instances of entity B, but each instance of B is
associated with only one instance of A (e.g., one department has
many employees).

o Many-to-many (M:N): Multiple instances of entity A can be


associated with multiple instances of entity B (e.g., students can
enroll in many courses, and courses can have many students).

• Modality: Indicates whether a relationship is optional or mandatory.

o Optional: An instance of an entity can exist without participating in


the relationship (e.g., a student may not be enrolled in any courses).

o Mandatory: An instance of an entity must participate in the


relationship (e.g., every course must have an instructor).

Cardinality and modality are usually represented in ER diagrams using symbols


or notations near the relationship diamond.

By understanding entities, attributes, relationships, and their properties, you can


effectively model data and create a solid foundation for your database design.
This leads us to the next important topic: normalization.

The Relational Model: Organizing Data in Tables

The relational model is the foundation of most modern database systems. It


provides a simple yet powerful way to organize and manage data using tables.
Here's a breakdown of its key components:

1. Tables:
• Structure: Data is organized into two-dimensional tables with rows and
columns.

o Rows (Records or Tuples): Each row represents a single instance


of an entity (e.g., a specific student, a particular book).

o Columns (Fields or Attributes): Each column represents a property


or characteristic of the entity (e.g., student ID, book title).

Example Table:

StudentID Name Major GPA

1 Alice Computer Science 3.8

2 Bob Biology 3.5

3 Carol History 3.9

Export to Sheets

2. Keys:

Keys are special attributes that play a crucial role in identifying and linking data
within and across tables.

• Primary Key:

o Uniqueness: It uniquely identifies each row in a table. No two rows


can have the same primary key value.

o Non-null: It cannot contain null values.

o Example: In the "Student" table, "StudentID" would be a good


primary key.

• Foreign Key:
o Referential Integrity: It creates a link between two tables by
referencing the primary key of another table.

o Establishes Relationships: It enables you to represent relationships


between entities in the relational model.

o Example: In a table called "Enrollment," you might have a


"StudentID" column that is a foreign key referencing the "Student"
table. This links enrollment records to specific students.

3. Relationships:

Relationships in the relational model are established through foreign keys. This
allows you to connect related data across multiple tables.

• Types of Relationships: The same types of relationships (one-to-one, one-


to-many, many-to-many) that we saw in the ER model apply to the
relational model. They are implemented using foreign keys.

Example Relationship:

Consider two tables: "Student" and "Enrollment."

Student Table:

StudentID Name Major

1 Alice Computer Science

2 Bob Biology

3 Carol History

Export to Sheets

Enrollment Table:

EnrollmentID StudentID CourseID Grade


1 1 CS101 A

2 2 BIO201 B

3 1 MATH101 A-

Export to Sheets

Here, "StudentID" in the "Enrollment" table is a foreign key referencing the


"Student" table. This establishes a one-to-many relationship between students and
enrollments (one student can have multiple enrollments).

Key Advantages of the Relational Model:

• Simplicity: Easy to understand and use.

• Flexibility: Adaptable to changing data requirements.

• Data Integrity: Mechanisms like primary and foreign keys help enforce
data consistency and accuracy.

• Powerful Querying: Structured Query Language (SQL) provides a


powerful and flexible way to query and manipulate data.

By understanding tables, keys, and relationships in the relational model, you can
effectively design and implement databases that are efficient, scalable, and
maintain data integrity.

Normalization: Refining Your Database Design

Normalization is a crucial process in database design that focuses on organizing


data efficiently to minimize redundancy and improve data integrity. Think of it as
tidying up your data to make it more organized and consistent.
Purpose of Normalization

• Reduce Data Redundancy: Redundancy means storing the same data


multiple times. This wastes storage space and increases the risk of
inconsistencies (where different copies of the same data have different
values).

• Improve Data Integrity: Normalization helps enforce data integrity rules,


ensuring that data is accurate and consistent.

• Simplify Data Management: A well-normalized database is easier to


maintain, update, and query.

Normal Forms: The Rules of the Game

Normal forms are a set of rules that guide the normalization process. Each normal
form builds upon the previous one, addressing specific types of redundancy and
data anomalies.

1. First Normal Form (1NF)

• Eliminate Repeating Groups: 1NF tackles the issue of repeating groups


of data within a single column. A repeating group occurs when multiple
values are stored in a single cell (e.g., listing multiple phone numbers in
one cell).

• Atomic Values: To achieve 1NF, ensure that each column contains only
atomic values (single, indivisible units of data).

• Create Separate Tables: If you have repeating groups, create new tables
to store those repeating values and link them to the main table using foreign
keys.

Example:

Unnormalized Table:
StudentID Name PhoneNumbers

1 Alice 555-1234, 555-5678

2 Bob 555-9012

Export to Sheets

1NF Table:

Student Table:

StudentID Name

1 Alice

2 Bob

Export to Sheets

PhoneNumbers Table:

PhoneID StudentID PhoneNumber

1 1 555-1234

2 1 555-5678

3 2 555-9012

Export to Sheets

2. Second Normal Form (2NF)

• Build on 1NF: A table must already be in 1NF to be considered for 2NF.

• Eliminate Partial Dependencies: 2NF addresses partial dependencies,


where a non-key attribute depends on only part of the primary key (in cases
where the primary key is composite, meaning it consists of multiple
columns).
• Create New Tables: To achieve 2NF, create new tables for data that
depends on only part of the primary key.

Example:

1NF Table with Partial Dependency:

OrderID ProductID ProductName Quantity

1 A1 Laptop 2

2 B2 Mouse 5

Export to Sheets

Here, the primary key is {OrderID, ProductID}. ProductName depends only on


ProductID, not the entire primary key.

2NF Tables:

Order Table:

OrderID ProductID Quantity

1 A1 2

2 B2 5

Export to Sheets

Product Table:

ProductID ProductName

A1 Laptop

B2 Mouse

Export to Sheets

3. Third Normal Form (3NF)


• Build on 2NF: A table must already be in 2NF to be considered for 3NF.

• Eliminate Transitive Dependencies: 3NF tackles transitive


dependencies, where a non-key attribute depends on another non-key
attribute.

• Create New Tables: To achieve 3NF, create new tables to remove


transitive dependencies.

Example:

2NF Table with Transitive Dependency:

EmployeeID Name Department DepartmentLocation

1 Alice Sales London

2 Bob Marketing New York

Export to Sheets

Here, DepartmentLocation depends on Department, which is a non-key attribute.

3NF Tables:

Employee Table:

EmployeeID Name Department

1 Alice Sales

2 Bob Marketing

Export to Sheets

Department Table:

Department DepartmentLocation

Sales London
Marketing New York

Export to Sheets

By applying these normal forms, you can create a well-structured database that is
efficient, consistent, and easier to manage. Keep in mind that while higher normal
forms generally lead to better database design, there might be situations where
denormalization (intentionally introducing some redundancy) is beneficial for
performance reasons. However, this should be done cautiously and with a clear
understanding of the trade-offs.

Applying Normalization: A Step-by-Step Guide

Normalization is an iterative process. You start with an unnormalized table and


progressively apply the normal forms to refine the design. Here's a practical guide
to walk you through the steps:

Step 1: Identify the Entities and Attributes

• Analyze the Data: Begin by examining the data you need to store. Identify
the key entities (e.g., customers, products, orders) and their associated
attributes (e.g., customer name, product price, order date).

• Create an Initial Table: Often, you'll start with a single table containing
all the data. This is usually in an unnormalized form.

Step 2: First Normal Form (1NF)

• Eliminate Repeating Groups: Identify any columns that contain


repeating groups of data (e.g., multiple phone numbers in a single cell).

• Create New Tables: For each repeating group, create a new table to store
those values.

• Establish Relationships: Use foreign keys to link the new tables to the
original table.
Step 3: Second Normal Form (2NF)

• Check for Composite Keys: If your tables have composite primary keys
(keys consisting of multiple columns), check for partial dependencies.

• Identify Partial Dependencies: A partial dependency exists when a non-


key attribute depends on only part of the primary key.

• Create New Tables: Create new tables to store the attributes that are
partially dependent on the primary key.

Step 4: Third Normal Form (3NF)

• Identify Transitive Dependencies: Look for transitive dependencies,


where a non-key attribute depends on another non-key attribute.

• Create New Tables: Create new tables to remove transitive dependencies.

Step 5: Review and Refine

• Examine Relationships: Ensure that the relationships between tables are


clear and correctly represented using foreign keys.

• Consider Denormalization: In some cases, you might intentionally


introduce some redundancy (denormalization) to improve performance.
However, this should be done carefully and with a clear understanding of
the trade-offs.

Example:

Let's say we have an unnormalized table for orders:

Unnormalized Table:

OrderI CustomerNa CustomerAddr ProductI ProductNa Quanti


D me ess D me ty
1 Alice 123 Main St A1 Laptop 2

1 Alice 123 Main St B2 Mouse 1

2 Bob 456 Oak Ave A1 Laptop 1

Export to Sheets

1NF:

• We notice that ProductID and ProductName repeat for the same OrderID.

• We create a separate "Product" table.

Order Table:

OrderID CustomerName CustomerAddress ProductID Quantity

1 Alice 123 Main St A1 2

1 Alice 123 Main St B2 1

2 Bob 456 Oak Ave A1 1

Export to Sheets

Product Table:

ProductID ProductName

A1 Laptop

B2 Mouse

Export to Sheets

2NF:

• We see that CustomerName and CustomerAddress depend only on


OrderID.

• We create a separate "Customer" table.


Order Table:

OrderID CustomerID ProductID Quantity

1 C1 A1 2

1 C1 B2 1

2 C2 A1 1

Export to Sheets

Customer Table:

CustomerID CustomerName CustomerAddress

C1 Alice 123 Main St

C2 Bob 456 Oak Ave

Export to Sheets

3NF:

• In this example, we don't have any transitive dependencies, so the tables


are already in 3NF.

This step-by-step process helps you systematically refine your database design to
achieve higher normal forms, leading to a more efficient and robust database.

You might also like