0% found this document useful (0 votes)
5 views

UNIT 3 Dbms Notes

Uploaded by

Mridul
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

UNIT 3 Dbms Notes

Uploaded by

Mridul
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT 3 Notes

Codd's Rule for Relational DBMS

E.F Codd was a Computer Scientist who invented the Relational model for
Database management. Based on relational model, the Relational database was
created. Codd proposed 13 rules popularly known as Codd's 12 rules to test
DBMS's concept against his relational model. Codd's rule actualy define what
quality a DBMS requires in order to become a Relational Database
Management System(RDBMS). Till now, there is hardly any commercial
product that follows all the 13 Codd's rules. Even Oracle follows only eight and
half(8.5) out of 13. The Codd's 12 rules are as follows

Rule zero

This rule states that for a system to qualify as an RDBMS, it must be able to
manage database entirely through the relational capabilities.

Rule 1: Information rule

All information (including metadata) is to be represented as stored data in cells


of tables. The rows and columns have to be strictly unordered.

for eg : Student Table Metadata should also be


stored in a cell
Erpid (pk) SName Marks
1 Neha 78 Student information should
2 Karan 80 be stored in a cell

Rule 2: Guaranted Access

Each unique piece of data(atomic value) should be accesible by: Table Name +
Primary Key(Row) + Attribute(column).

Erpid (pk) SName Marks


1 Neha 78
2 Karan 80
For eg : from this student table, if we want to find the marks of a student whose
Erpid. is 2.we can find this by: select marks from student where Erpid = 2;

Rule 3: Systematic treatment of NULL

This rule states about handling the NULLs in the database. As database consists
of various types of data, each cell will have different datatypes. If any of the cell
value is unknown, or not applicable or missing, it cannot be represent as zero or
empty. It will be always represented as NULL. This NULL should be acting
irrespective of the datatype used for the cell. When used in logical or
arithmetical operation, it should result the value correctly.

For example:
Adding NULL to numeric 5 should result NULL –

5+ unknown = unknown 5+ NULL = NULL

5+ NULL! = 5 or 0
It should not result in any zero or numeric value. DBMS should be strong
enough to handle these NULLs according to the situation and the datatypes.

Null has several meanings; it can mean missing data, not applicable or no value.
It should be handled consistently. Also, Primary key must not be null, ever.
Expression on NULL must give null.

Erpid (pk) SName Marks Phone no.


1 Neha 78 7890675
2 Karan 80 NULL
3 Rohan 82 NULL

Primary key value Missing data


can’t be null

Rule 4: Active Online Catalog

This rule illustrates data dictionary. Metadata should be maintained for all the
data in the database. These metadata should also be stored as tables, rows and
columns. It should also have access privileges. In short, these metadata stored in
the data dictionary should also obey all the characteristics of a database. Also, it
should have correct up to date data. We should be able to access these metadata
by using same query language that we use to access the database.

Database dictionary(catalog) is the structure description of the


complete Database and it must be stored online. The Catalog must be governed
by same rules as rest of the database. The same query language should be used
on catalog as used to query database.

For eg :

SELECT * FROM ALL_TAB; -- ALL_TAB is the table which has the table
definitions that the user owns and has access. It is queried using the same SQL
query that we use in the database.

Rule 5: Powerful and Well-Structured Language

One well structured language must be there to provide all manners of access to
the data stored in the database. Example: SQL, etc. If the database allows
access to the data without the use of this language, then that is a violation.

Erpid (pk) SName Marks


1 Neha 78
2 Karan 80

For eg : consider this student table, if we want to find the marks of a student
whose roll no. is 2. So we can find this by some powerful language for eg: SQL
(Structured query language)

select marks from student where Erpid = 2;

Rule 6: View Updation Rule

All the view that are theoretically updatable should be updatable by the system
as well.
Views are the virtual tables created by using queries to show the partial view of
the table. That is views are subset of table, it is only partial table with few rows
and columns. This rule states that views are also be able to get updated as we do
with its table.

For eg : In this student table , Student_view is the virtual table

Student Student_view

Erpid (pk) SName Marks Erpid (pk) Marks


1 Neha 78 1 78
2 Karan 80 2 80

According to this rule, we should be able to update the records in Student_view

Rule 7: Relational Level Operation

There must be Insert, Delete, Update operations at each level of relations. Set
operation like Union, Intersection and minus should also be supported.

For example:
Suppose employees got 5% hike in a year. Then their salary has to be updated to
reflect the new salary. Since this is the annual hike given to the employees, this
increment is applicable for all the employees. Hence, the query should not be
written for updating the salary one by one for thousands of employee. A single
query should be strong enough to update the entire employee’s salary at a time.

Rule 8: Physical Data Independence

The physical storage of data should not matter to the system. If say, some file
supporting table is renamed or moved from one disk to another, it should not
effect the application.

For example:
If the data stored in one disk is transferred to another disk, then the user viewing
the data should not feel the difference or delay in access time. The user should
be able to access the data as he was accessing before. Similarly, if the file name
for the table is changed in the memory, it should not affect the table or the user
viewing the table. This is known as physical independence and database should
support this feature.

Rule 9: Logical Data Independence

If there is change in the logical structure(table structures) of the database the


user view of data should not change. Say, if a table is split into two tables, a
new view should give result as the join of the two tables. This rule is most
difficult to satisfy.

For example:
If we split the EMPLOYEE table according to his department into multiple
employee tables, the user viewing the employee table should not feel that these
records are coming from different tables. These split tables should be able to get
joined and show the result. In our example we can use UNION and display the
results to the user.

But in ideal scenario, this is difficult to achieve since all the logical and user
view will be tied so strongly that they will be almost same.

Rule 10: Integrity Independence

The database should be able to enforce its own integrity rather than using other
programs. Key and Check constraints, trigger etc, should be stored in Data
Dictionary. This also make RDBMS independent of front-end.

For example:
Suppose we want to insert an employee for department 50 using an application.
But department 50 does not exists in the system. In such case, the application
should not perform the task of fetching if department 50 exists, if not insert the
department and then inserting the employee. It should all handled by the
database.
Rule 11: Distribution Independence

A database should work properly regardless of its distribution across a network.


Even if a database is geographically distributed, with data stored in pieces, the
end user should get an impression that it is stored at the same place. This lays
the foundation of distributed database.

The database can be located at the user server or at any other network. The end
user should not be able to know about the database servers. He should be able to
get the records as if he is pulling the records locally. Even if the database is
located in different servers, the accessibility time should be comparatively less.

Rule 12: Nonsubversion Rule

If low level access is allowed to a system it should not be able to subvert or


bypass integrity rules to change the data. This can be achieved by some sort of
looking or encryption.

For example:
Update Student’s address query should always be converted into low level
language which updates the address record in the student file in the memory. It
should not be updating any other record in the file nor inserting some malicious
record into the file/memory.
 Mapping conceptual model to Relational model

After designing the ER diagram of system, we need to convert it to Relational


models which can directly be implemented by any RDBMS like Oracle,
MySQL etc.
Functional Dependency

The functional dependency is a relationship that exists between two attributes. It


typically exists between the primary key and non-key attribute within a table.

X------ Y

The left side of FD is known as a determinant, the right side of the production is
known as a dependent.

Functional Dependency (FD) is a constraint that determines the relation of one


attribute to another attribute in a Database Management System (DBMS).
Functional Dependency helps to maintain the quality of data in the database. It
plays a vital role to find the difference between good and bad database design.

A functional dependency is denoted by an arrow "→". The functional


dependency of X on Y is represented by X → Y.

Functional Dependency in DBMS with example.

Example:

Employee number Employee Name Salary City

1 Dana 50000 San Francisco

2 Francis 38000 London

3 Andrew 25000 Tokyo

In this example, if we know the value of Employee number, we can obtain


Employee Name, city, salary, etc. By this, we can say that the city, Employee
Name, and salary are functionally depended on Employee number.

The Functional dependency has 6 types of inference rule:

1. Reflexive Rule (IR1)

In the reflexive rule, if Y is a subset of X, then X determines Y.

If X ⊇ Y then X → Y

Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2)

The augmentation is also called as a partial dependency. In augmentation, if X


determines Y, then XZ determines YZ for any Z.

If X → Y then XZ → YZ

Example:

For R(ABCD), if A → B then AC → BC


3. Transitive Rule (IR3)

In the transitive rule, if X determines Y and Y determine Z, then X must also


determine Z.

If X → Y and Y → Z then X → Z
4. Union Rule (IR4)

Union rule says, if X determines Y and X determines Z, then X must also


determine Y and Z.

If X → Y and X → Z then X → YZ

Proof:

X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)

5. Decomposition Rule (IR5)

Decomposition rule is also known as project rule. It is the reverse of union rule.

This Rule says, if X determines Y and Z, then X determines Y and X


determines Z separately.

If X → YZ then X → Y and X → Z
Proof:

X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)

6. Pseudo transitive Rule (IR6)

In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ


determines W.

If X → Y and YZ → W then XZ → W

Proof:

X → Y (given)
WY → Z (given)

WX → WY (using IR2 on 1 by augmenting with W)

WX → Z (using IR3 on 3 and 2)

Types of Functional dependencies in DBMS: (refer ppt)

Key and types of keys – already covered in unit 2 notes

Anomalies in DBMS

Normalization is a process of organizing the data in database to avoid data


redundancy, insertion anomaly, update anomaly & deletion anomaly.

There are three types of anomalies that occur when the database is not
normalized. These are – Insertion, update and deletion anomaly. Let’s take an
example to understand this.

Example: Suppose a manufacturing company stores the employee details in a


table named employee that has four attributes: emp_id for storing employee’s
id, emp_name for storing employee’s name, emp_address for storing
employee’s address and emp_dept for storing the department details in which
the employee works. At some point of time the table looks like this:

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when
a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he
belongs to two departments of the company. If we want to update the address of
Rick then we have to update the same in two rows or the data will become
inconsistent. If somehow, the correct address gets updated in one department
but not in other then as per the database, Rick would be having two different
addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under
training and currently not assigned to any department then we would not be able
to insert the data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the


department D890 then deleting the rows that are having emp_dept as D890
would also delete the information of employee Maggie since she is assigned
only to this department.

To overcome these anomalies we need to normalize the data. In the next section
we will discuss about normalization.

Next →← Prev
Normalization

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or
set of relations. It is also used to eliminate the undesirable
characteristics like Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links
them using relationship.
o The normal form is used to reduce redundancy from the database table.

Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)

First normal form (1NF)

As per the rule of first normal form, an attribute (column) of a table cannot hold
multiple values. It should hold only atomic values.

Example: Suppose a company wants to store the names and contact details of
its employees. It creates a table that looks like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

9900012222

103 Ron Chennai 7778881212


104 Lester Bangalore 9990000123

8123450987

Two employees (Jon & Lester) are having two mobile numbers so the company
stored them in the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have
atomic (single) values”, the emp_mobile values for employees Jon & Lester
violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:

 Table is in 1NF (First normal form)


 No non-prime attribute is dependent on the proper subset of any candidate
key of table.

An attribute that is not part of any candidate key is known as non-prime


attribute.

Example: Suppose a school wants to store the data of teachers and the subjects
they teach. They create a table that looks like this: Since a teacher can teach
more than one subjects, the table can have multiple rows for a same teacher.

teacher_id Subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not
in 2NF because non prime attribute teacher_age is dependent on teacher_id
alone which is a proper subset of candidate key. This violates the rule for 2NF
as the rule says “no non-prime attribute is dependent on the proper subset of any
candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id teacher_age
111 38

222 38

333 40

teacher_subject table:
teacher_id Subject

111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry

Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)

A table design is said to be in 3NF if both the following conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime attribute on any super key
should be removed.

An attribute that is not part of any candidate key is known as non-prime


attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF
and for each functional dependency X-> Y at least one of the following
conditions hold:
 X is a super key of table
 Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime


attribute.

Example: Suppose a company wants to store the complete address of each


employee, they create a table named employee_details that looks like this:

emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name,


emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are
not part of any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And,


emp_zip is dependent on emp_id that makes non-prime attributes (emp_state,
emp_city & emp_district) transitively dependent on super key (emp_id). This
violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables
to remove the transitive dependency:

employee table:

emp_id emp_name emp_zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

employee_zip table:
emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan


ADVANTAGES OF NORMALIZATION

Here we can see why normalization is an attractive prospect in RDBMS

concepts.

1) A smaller database can be maintained as normalization eliminates the

duplicate data. Overall size of the database is reduced as a result.

2) Better performance is ensured which can be linked to the above point. As

databases become lesser in size, the passes through the data becomes faster and

shorter thereby improving response time and speed.

3) Narrower tables are possible as normalized tables will be fine-tuned and will

have lesser columns which allows for more data records per page.

4) Fewer indexes per table ensures faster maintenance tasks (index rebuilds).

5) Also realizes the option of joining only the tables that are needed.

DISADVANTAGES OF NORMALIZATION

1) More tables to join as by spreading out data into more tables, the need to join

table’s increases and the task becomes more tedious. The database becomes

harder to realize as well.


2) Tables will contain codes rather than real data as the repeated data will be

stored as lines of codes rather than the true data. Therefore, there is always a

need to go to the lookup table.

3) Data model becomes extremely difficult to query against as the data model is

optimized for applications, not for ad hoc querying. (Ad hoc query is a query

that cannot be determined before the issuance of the query. It consists of an

SQL that is constructed dynamically and is usually constructed by desktop

friendly query tools.). Hence it is hard to model the database without knowing

what the customer desires.

4) As the normal form type progresses, the performance becomes slower and

slower.

5) Proper knowledge is required on the various normal forms to execute the

normalization process efficiently. Careless use may lead to terrible design filled

with major anomalies and data inconsistency.

Decomposition in DBMS

o When a relation in the relational model is not in appropriate normal form


then the decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems
like loss of information.
o Decomposition is used to eliminate some of the problems of bad design
like anomalies, inconsistencies, and redundancy.

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result
in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME


22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and


DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME


827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID",
then the resultant relation will look like:

Employee ⋈ Department

EMP_I EMP_NAM EMP_AG EMP_CIT DEPT_I DEPT_NAM


D E E Y D E
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalor 869 Finance
e
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Lossy Decomposition

The decompositions R1, R2, R2…Rn for a relation schema R are said to be

Lossy if there natural join results into additon of extraneous tuples with the the

original relation R.

Formally, Let R be a relation and R1, R2, R3 … Rn be it’s decomposition, the


decomposition is lossy if –
R ⊂ R1 ⨝R2 ⨝R3 .... ⨝Rn
There is loss of information as extraneous tuples are added into the relation

after natural join of decompositions. Thus, it is also referred to as careless

decomposition.
The common attribute of the sub relation is not a superkey of any of the sub

relation.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must
satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be derivable
from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC)
and R2(AD) which is dependency preserving because FD A->BC is a part
of relation R1(ABC).

Case study on normalization(I will send the links)

You might also like