CCS356oose Assignment
CCS356oose Assignment
BY
A.CAROLINE
910022104005
A data flow diagram shows the way information flows through a process or system. It
includes data inputs and outputs, data stores, and the various subprocesses the data moves
through. DFDs are built using standardized symbols and notation to describe various entities
and their relationships.
Data flow diagrams visually represent systems and processes that would be hard to describe
in just words. You can use these diagrams to map out an existing system and make it better or
to plan out a new system for implementation. Visualizing each element makes it easy to
identify inefficiencies and produce the best possible system.
a)TYPES
1. Logical data flow diagram
2. Physical data flow diagrams
Physical data flow diagrams focus on how things happen in an information flow.
These diagrams specify the software, hardware, files, and people involved in an
information flow.
A detailed physical data flow diagram can facilitate the development of the code
needed to implement a data system.
Both physical and logical data flow diagrams can describe the same information flow.
In coordination they provide more detail than either diagram would independently. As
you decide which to use, keep in mind that you may need both.
Data flow diagrams are also categorized by level. Starting with the most basic, level 0,
DFDs get increasingly complex as the level increases. As you build your own data flow
diagram, you will need to decide which level your diagram will be.
Level 0 DFDs
Level 0 DFDs, also known as context diagrams, are the most basic data flow diagrams. They
provide a broad view that is easily digestible but offers little detail. Level 0 data flow
diagrams show a single process node and its connections to external entities. For instance, the
example shown below illustrates the hotel reservation process with the flow of information
between admin and guests.
Level 1 DFDs
Level 1 DFDs are still a general overview, but they go into more detail than a context
diagram. In level 1 DFD, the single process node from the context diagram is broken down
into sub-processes. As these processes are added, the diagram will need additional data flows
and data stores to link them together. In the hotel reservation example, this can include
adding the room selection and inquiry processes to the reservation system, as well as data
stores.
Level 2+ DFDs
Level 2+ DFDs simply break processes down into more detailed sub-processes. In theory,
DFDs could go beyond level 3, but they rarely do. Level 3 data flow diagrams are detailed
enough that it doesn’t usually make sense to break them down further.
The level 2 diagram below expands on the hotel reservation process to include more granular
processes involved, such as the cancellation and confirmation processes and subsequent
connected data flows.
c)REALTIME EXAMPLES
2. ER-DIAGRAM
An Entity–relationship model (ER model) describes the structure of a database with the
help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An
ER model is a design or blueprint of a database that can later be implemented as a
database. The main components of E-R model are: entity set and relationship set.
DEFINITION:
An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a
table or attribute of a table in database, so by showing relationship among tables and their
attributes, ER diagram shows the complete logical structure of a database. Lets have a
look at a simple ER diagram to understand this concept.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students
however a student cannot study in multiple colleges at the same time. Student entity has
attributes such as Stu_Id, Stu_Name & Stu_Addr and College entity has attributes such as
Col_ID & Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss these
terms in detail in the next section(Components of a ER Diagram) of this guide so don’t worry
too much about these terms now, just go through them once.
Components of a ER Diagram
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
double rectangle. For example – a bank account cannot be uniquely identified without
knowing the bank to which the account belongs, so bank account is a weak entity.
2. Attribute
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll
number can uniquely identify a student from a set of students. Key attribute is represented by
oval same as other attributes however the text of key attribute is underlined.
2. Composite attribute:
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is represented
with double ovals in an ER Diagram. For example – A person can have more than one phone
numbers so the phone number attribute is multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is
represented by dashed oval in an ER Diagram. For example – Person age is a derived
attribute as it changes over time and can be derived from another attribute (Date of birth).
3. Relationship
When a single instance of an entity is associated with a single instance of another entity then
it is called one to one relationship. For example, a person has only one passport and a
passport is given to one person.
2. One to Many Relationship
When a single instance of an entity is associated with more than one instances of another
entity then it is called one to many relationship. For example – a customer can place many
orders but a order cannot be placed by many customers.
When more than one instances of an entity is associated with a single instance of another
entity then it is called many to one relationship. For example – many students can study in a
single college but a student cannot study in many colleges at the same time.
When more than one instances of an entity is associated with more than one instances of
another entity then it is called many to many relationship. For example, a can be assigned to
many projects and a project can be assigned to many students.
Total Participation of an Entity set
Total participation of an entity set represents that each entity in entity set must have at least
one relationship in a relationship set. It is also called mandatory participation. For
example: In the following diagram each college must have at-least one associated Student.
Total participation is represented using a double line between the entity set and relationship
set.
Partial participation of an entity set represents that each entity in the entity set may or may
not participate in the relationship instance in that relationship set. It is also called as optional
participation
Partial participation is represented using a single line between the entity set and relationship
set.
Example: Consider an example of an IT company. There are many employees working for
the company. Let’s take the example of relationship between employee and role software
engineer. Every software engineer is an employee but not every employee is software
engineer as there are employees for other roles as well, such as housekeeping, managers,
CEO etc. so we can say that participation of employee entity set to the software engineer
relationship is partial.
3.NORMALIZATION IN DBMS
Data integrity
Efficiency querying
Let’s consider a complex database with multiple related tables that stores redundant
information. In this scenario, queries involving joins become more complicated and resource-
intensive. Normalization will help simplify querying by breaking down data into smaller
tables, with each table containing only relevant information, thereby reducing the need for
complex joins.
Storage optimization
A major problem with redundant data is that it occupies unnecessary storage space. For
instance, if we store the same product details in every order record, it leads to duplication.
With normalization, you can eliminate redundancy by splitting data into separate tables.
Normalization plays a crucial role in database design. Here are several reasons why it’s
essential:
If a table is not properly normalized and has data redundancy, it will not only take up extra
data storage space but also make it difficult to handle and update the database.
There are several factors that drive the need for normalization, from data redundancy(as
covered above) to difficulty managing relationships. Let’s get right into it:
Insertion, deletion, and update anomalies: Any form of change in a table can lead
to errors or inconsistencies in other tables if not handled carefully. These changes can
either be adding new data to a database, updating the data, or deleting records, which
can lead to unintended loss of data.
Difficulty in managing relationships: It becomes more challenging to maintain
complex relationships in an unnormalized structure.
Other factors that drive the need for normalization are partial
dependencies and transitive dependencies, in which partial dependencies can lead to
data redundancy and update anomalies, and transitive dependencies can lead to data
anomalies. We will be looking at how these dependencies can be dealt with to ensure
database normalization in the coming sections.
In this section, we will briefly discuss the different normalization levels and then explore
them deeper in the next section.
This normalization level ensures that each column in your data contains only atomic values.
Atomic values in this context means that each entry in a column is indivisible. It is like
saying that each cell in a spreadsheet should hold just one piece of information. 1NF ensures
atomicity of data, with each column cell containing only a single value and each column
having unique names.
Eliminates partial dependencies by ensuring that non-key attributes depend only on the
primary key. What this means, in essence, is that there should be a direct relationship
between each column and the primary key, and not between other columns.
Removes transitive dependencies by ensuring that non-key attributes depend only on the
primary key. This level of normalization builds on 2NF.
This is a more strict version of 3NF that addresses additional anomalies. At this
normalization level, every determinant is a candidate key.
5NF is the highest normalization level that addresses join dependencies. It is used in specific
scenarios to further minimize redundancy by breaking a table into smaller tables.
1NF ensures that each column cell contains only atomic values. Imagine a library database
with a table storing book information (title, author, genre, and borrowed_by). If the table is
not normalized, borrowed_by could contain a list of borrower names separated by commas.
This violates 1NF, as a single cell holds multiple values. The table below is a good
representation of a table that violates 1NF, as described earlier.
J. R. R.
The Lord of the Rings Fantasy Emily Garcia, David Lee
Tolkien
The solution?
In 1NF, we create a separate table for borrowers and link them to the book table. These tables
can either be linked using the foreign key in the borrower table or a separate linking table.
The foreign key in the borrowers table approach involves adding a foreign key column to the
borrowers table that references the primary key of the books table. This will enforce a
relationship between the tables, ensuring data consistency.
Books table
Borrowers table
1 John Doe 1
2 Jane Doe 1
3 James Brown 1
4 Emily Garcia 2
5 David Lee 2
6 Michael Chen 3
From the 1NF that was implemented, we already have two separate tables (you can check the
1NF section).
Now, let’s say we want to link these tables to record borrowings. The initial approach might
be to simply add a borrower_id column to the books table, as shown below:
book_id borrower_id
title author genre
(PK) (FK)
J. R. R.
2 The Lord of the Rings Fantasy NULL
Tolkien
This might look like a solution, but it violates 2NF simply because the borrower_id only
partially depends on the book_id. A book can have multiple borrowers, but a single
borrower_id can only be linked to one book in this structure. This creates a partial
dependency.
The solution?
We need to achieve the many-to-many relationship between books and borrowers to achieve
2NF. This can be done by introducing a separate table:
Book_borrowings table
borrowing_id (PK) book_id (FK) borrower_id (FK) borrowed_date
1 1 1 2024-05-04
2 2 4 2024-05-04
3 3 6 2024-05-04
This table establishes a clear relationship between books and borrowers. The book_id and
borrower_id act as foreign keys, referencing the primary keys in their respective tables. This
approach ensures that borrower_id depends on the entire primary key (book_id) of the books
table, complying with 2NF.
From the 2NF we already implemented, there are three tables in our library database:
Books table
1 John Doe 1
2 Jane Doe 1
3 James Brown 1
4 Emily Garcia 2
5 David Lee 2
6 Michael Chen 3
Book_borrowings table
1 1 1 2024-05-04
2 2 4 2024-05-04
3 3 6 2024-05-04
The 2NF structure looks efficient, but there might be a hidden dependency. Imagine we add a
due_date column to the books table. This might seem logical at first sight, but it’s going to
create a transitive dependency where:
The due_date column depends on the borrowing_id (a non-key attribute) from the
book_borrowings table.
The borrowing_id in turn depends on book_id (the primary key) of the books table.
The solution?
We can move the due_date column to the most appropriate table by updating the
book_borrowings table to include the due_date and returned_date columns.
1 1 1 2024-05-04 2024-05-20
2 2 4 2024-05-04 2024-05-18
3 3 6 2024-05-04 2024-05-10
BCNF is based on functional dependencies that consider all candidate keys in a relationship.
BCNF is a stricter version of 3NF. It ensures that every determinant (a set of attributes that
uniquely identify a row) in a table is a candidate key (a minimal set of attributes that uniquely
identify a row). The whole essence of this is that all determinants should be able to serve as
primary keys.
It ensures that every functional dependency (FD) has a superkey as its determinant. In other
words, if X —> Y (X determines Y) holds, X must be a candidate key (superkey) of the
relation. Please note that X and Y are columns in a data table.
Books table
Borrowers table
borrower_id (PK) name book_id (FK)
1 John Doe 1
2 Jane Doe 1
3 James Brown 1
4 Emily Garcia 2
5 David Lee 2
6 Michael Chen 3
Book_borrowings table
2024-05-
1 1 1 2024-05-04
20
2024-05-
2 2 4 2024-05-04
18
2024-05-
3 3 6 2024-05-04
10
While the 3NF structure is good, there might be a hidden determinant in the
book_borrowings table. Assuming one borrower cannot borrow the same book twice
simultaneously, the combination of book_id and borrower_id together uniquely identifies a
borrowing record.
This structure violates BCNF since the combined set (book_id and borrower_id) is not the
primary key of the table (which is just borrowing_id).
The solution?
To achieve BCNF, we can either decompose the book_borrowings table into two separate
tables or make the combined attribute set the primary key.
A table with borrowing_id as the primary key, borrowed_date, due_date, and
returned_date.
Another separate table to link books and borrowers, with book_id as a foreign
key, borrower_id as a foreign key, and potentially additional attributes specific
to the borrowing event.
2. Approach 2 (make the combined attribute set the primary key): We can consider
making book_id and borrower_id a composite primary key for uniquely identifying
borrowing records. The problem with this approach is that it won’t serve its purpose if
a borrower can borrow the same book multiple times.
In the end, your choice between these options depends on your specific data needs and how
you want to model borrowing relationships.
The library example we’ve been using throughout these explanations is not applicable at this
normalization level. 4NF typically applies to situations where a single attribute might have
multiple dependent attributes that don’t directly relate to the primary key.
Let’s use another scenario. Imagine a database that stores information about publications. We
will be considering a “Publications” table with columns, title, author, publication_year, and
keywords.
publication_id
title author publication_year keywords
(PK)
The solution?
1 Coming-of-Age
1 Legal
2 Fantasy
2 Epic
2 Adventure
3 Romance
3 Social Commentary
5NF is the most complex form of normalization that eliminates join dependencies. This is a
situation where data needs to be joined from multiple tables to answer a specific query, even
when those tables are already in 4NF.
In simpler terms, 5NF ensures that no additional information can be derived by joining the
tables together that wasn’t already available in the separate tables.
Join dependencies are less likely to occur when tables are already normalized (in 3NF or
4NF), hence the difficulty in creating a clear and straightforward example for 5NF.
However, let’s take a look at this scenario where 5NF might be relevant:
Imagine a university database with normalized tables for “Courses” and “Enrollments.”
Courses table
1 12345 101 A
2 12345 202 B
3 56789 301 A-
4 56789 401 B+
Assuming these tables are already in 3NF or 4NF, a join dependency might exist depending
on how data is stored. For instance, a course has a prerequisite requirement stored within the
“Courses” table as the “prerequisite_course_id” column.
This might seem efficient at first glance. However, consider a query that needs to retrieve a
student’s enrolled courses and their respective prerequisites. In this scenario, you would need
to join the “Courses” and “Enrollments” tables, then potentially join the “Courses” table to
retrieve prerequisite information.
The Solution?
To potentially eliminate the join dependency and achieve 5NF, we could introduce a separate
“Course Prerequisites” table:
Course_prerequisite table
301 NULL
401 202
This approach separates prerequisite information and allows efficient retrieval of enrolled
courses and their prerequisites in a single join between the “Enrollments” and
“Course_prerequisites” tables.
Note: We are assuming a student can only have one prerequisite per course.
5NF is a very complex and rare type of normalization, so as someone just starting their
learning journey in data, you might not find an application. However, it’s going to be added
knowledge and will make you prepared when you stumble on complex databases