We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 25
Hand-Out 2
12.2.1 Data Redundancy of Unit 12
+ Redundancy means having multiple copies of same data in the
database.
+ This problem arises when a database is not normalized.
+ Redundant data is a bad idea because when you modify data
), then you need to do it in more than one place.
Data redundancy problems/anomalies
* Insertion anomalies
* Deletion anomalies
+ Updation anomalies
Data redundancy leads to
* data inconsistency
+ larger storage requirements
+ slower processing
+ increased operational costsExample of redundancy
A table of student details attributes are:
student Id, student name, college name, college rank, course opted.
[studenti> [ware [contact |collese course | Rank|
10 Tom maa
0112729729, TM, 2
101 sack 112588588 IT BTech 1
102 si 777849031. mT BTech 1
103 Mary 715681293, 1T Brecht
As it can be observed that values of attributes college name, college
rank and course are being repeated which can lead to problems.
Insertion anomaly
You cannot insert a new course, unless a student takes the
course
a a
iT MSc. IT 5
This problem happens when the insertion of a data record is not
possible without adding some additional unrelated data to the record.Deletion anomaly
If the details of students in this table is deleted, then the details of
college will also get deleted which should not occur by common sense.
Seta Ho leu
Tom 26787574
This anomaly happens when deletion of a data record results in losing
some unrelated information that was stored as part of the record that
was deleted from a table,
Updation anomaly
Suppose if the rank of the college changes then changes will have to be
all over the database which will be time-consuming and
computationally costly.
Coe be ee Ed
0112729729
aot Jack o112seese8 rr B.Tech
102 ai o777eagoa1. B.Tech
103 Mary 715681291 IT Bech
If updation do not occur at all places, then the database will be in an
inconsistent state.When is data redundancy a good thing?
When data is redundant, employees enjoy fast access and quick updates
because the necessary information is available on multiple systems. This is
particularly important for customer service-based organizations whose
customers expect promptness and efficiency.
In mission critical scenarios, loss of files could be compensated if the data
was redundant. Data redundancy is desirable for backup purposes.
It is impossible to eliminate data redundancy in big data. Many Big Data
systems are designed to trade data consistency for processing time,
availability, and cost benefits12.2.2. Normalisation
Normalization is a technique of organizing the data in the database.
Itis a systematic approach of decomposing tables to eliminate data
fedundancy(repe n) and undesirable characteristics like Insertion,
Normalization is used for mainly two purposes:
+ Eliminating redundant(useless).
+ Ensuring data dependencies make sense i.e. data is logically stored.
www.youtube.com/watch ?v=xoTyrdT9SZI
Steps in Normalisation
irst Normal Form ~The information is stored in a relational table
with each column containing atomic values. There are no repeating
groups of columns (a column doesn’t hold multiple values).
* Second Normal Form — The table is in first normal form and all the
columns depend on the table’s primary key
* Third Normal For the table is in second normal form and all of its
columns are not transitively dependent on the primary key.Primary Key
+ The PRIMARY KEY uniquely identifies each record in a table.
+ Primary keys must contain UNIQUE values, and cannot contain NULL
values.
+ When a primary key is made of multiple columns, it is called a
+ Examples include
* Student ID in the student table, Vehicle ID in the vehicle table
* ISBN ina stock table in a book store, Book ID in a book table in a library
+ Student ID + Subject ID together in a subject offered table
Foreign Key
Foreign Key references the primary key of another Table! It helps
connect your Tables.
* The foreign key is the common field that links tables together.
+ A foreign key can have a different name from its primary key
+ It ensures rows in one table have corresponding rows in another
* Unlike the Primary key, they do not have to be unique.
+ Foreign key is marked with a * when writing tables.First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following
rules:
+ Single Valued Attributes (atomic) - Each column of your table should
be single valued which means they should not contain multiple values.
+ Values stored in a column should be of the same domain. For example
if the column is DoB, it must hold only date of birth and not some
other dates
+ Allthe columns ina table should have unique names.
* The order in which data is stored, does not matter.
ANF Example
Consider a table to hold student data which will have student's roll no.,
their name and the name of subjects they have opted for.
roll_n0 name subject
101 Jack 05,0N,
103 si Java
102 Tom co
* The table already satisfies 3 rules out of the 4 rules.
* Multiple subject names are stored in a single column. But as per the
NF rule each column must contain atomic valueHow to solve this Problem?
It's very simple, because all we have to do is break the values into atomic values.
roll no name subject
101 Jack 0s
101 Jack c
103 si Java
102 Tom c
102 Tom cH
By doing so, although a few values are getting repeated but values for the «s»ject column
are now atomic for each record/row.
Using the First Normal Form, data redundancy increases, as there will be many columns
with same data in multiple rows but each row as a whole will be unique,
What is the primary key of the table? rollno |name subject
11 lack 0
+ Before proceeding to 2NF mark the primary key. wor lea Te
* Is it roll_no? No, as it is not unique for each record. 3 tae
+ The primary key is the [roll no + subject 102 Tom oc
+ We have a composite key for this table. 102__[Tom__ jew
+ Is this the student table?
* No, it is actually the
+ What about student table? This will arrived at 2NF as
the name is partially dependent on roll_no.
+ SubjectOffered(roll_no, subject, name)Notice the pattern...
* IN UNF you would notice one field looks unique (roll no)
+ Once you make the data atomic, you end up with redundant data.
* The primary key of your table is going to be a composite.
* The PKs the field that looked unique + the field(s) that you made atomic
+ In this case roll no + subject
Second Normal Form (2NF) __ hitos//voutue/n7uisuarae
For a table to be in the Second Normal Form
*It should be in the First Normal form.
*And, it should not have Partial Dependency.Dependency
* We can get the Date of Birth of student with stucent_id (PK) »
+ Similarly, we can any other piece of data about a particular student if we know
the student_id,
+ Soall you need is student_icand all the columns depends on it, or can be
fetched using it.
+ This is Dependency and we also call it Functional Dependency.
student. name Gender Dos
101 Jack Male 11/01/2007
102 ai Female 10/12/2008
Partial Dependency
+ We have a Student table with student information and another table Subject for storing
subject information. Shown below is the Score table.
+ In tables where the primary key is a €ompasite there could be instances where certain
columns depend only on part of the primary key.
* In the table below the primary key is student_id + subject_id.
+ Partial dependency will exist only if the table has a composite key.
[ese se nossa
Java_01 Mrs. Bean
Python_02 75 Ms. Snake
ka Java_01 80 Mrs. BeanWhere is the Partial Dependency? 2NF
+ If you look at the Score table, we have a column teacher which is
only dependent on the subject; for Java it's Java Teacher and for C++
it's C+ Teacher & so on.
+ The primary key for this table is a composition of two columns which
is student _id & subject_ia butthe teacher's name only depends
on the subject id.
* This is Partial Dependency, where an attribute in a table depends on
only a part of the primary key and not on the whole key.
2NF
How to remove Partial Dependency?
* The solution is to remove column teacher from Score table and add
it to the Subject table. Hence, the Subject table will become:
+ Subject(Subject Id, Teacher)
+ Score(Student Id, Subject ID*, Marks)
+ The Score table will be without the Teacher column.
+ Put the partial dependent field to a table where the Primary key will be the field
‘on which there was partial dependency,