UNIT 3 Dbms Notes
UNIT 3 Dbms Notes
E.F Codd was a Computer Scientist who invented the Relational model for
Database management. Based on relational model, the Relational database was
created. Codd proposed 13 rules popularly known as Codd's 12 rules to test
DBMS's concept against his relational model. Codd's rule actualy define what
quality a DBMS requires in order to become a Relational Database
Management System(RDBMS). Till now, there is hardly any commercial
product that follows all the 13 Codd's rules. Even Oracle follows only eight and
half(8.5) out of 13. The Codd's 12 rules are as follows
Rule zero
This rule states that for a system to qualify as an RDBMS, it must be able to
manage database entirely through the relational capabilities.
Each unique piece of data(atomic value) should be accesible by: Table Name +
Primary Key(Row) + Attribute(column).
This rule states about handling the NULLs in the database. As database consists
of various types of data, each cell will have different datatypes. If any of the cell
value is unknown, or not applicable or missing, it cannot be represent as zero or
empty. It will be always represented as NULL. This NULL should be acting
irrespective of the datatype used for the cell. When used in logical or
arithmetical operation, it should result the value correctly.
For example:
Adding NULL to numeric 5 should result NULL –
5+ NULL! = 5 or 0
It should not result in any zero or numeric value. DBMS should be strong
enough to handle these NULLs according to the situation and the datatypes.
Null has several meanings; it can mean missing data, not applicable or no value.
It should be handled consistently. Also, Primary key must not be null, ever.
Expression on NULL must give null.
This rule illustrates data dictionary. Metadata should be maintained for all the
data in the database. These metadata should also be stored as tables, rows and
columns. It should also have access privileges. In short, these metadata stored in
the data dictionary should also obey all the characteristics of a database. Also, it
should have correct up to date data. We should be able to access these metadata
by using same query language that we use to access the database.
For eg :
SELECT * FROM ALL_TAB; -- ALL_TAB is the table which has the table
definitions that the user owns and has access. It is queried using the same SQL
query that we use in the database.
One well structured language must be there to provide all manners of access to
the data stored in the database. Example: SQL, etc. If the database allows
access to the data without the use of this language, then that is a violation.
For eg : consider this student table, if we want to find the marks of a student
whose roll no. is 2. So we can find this by some powerful language for eg: SQL
(Structured query language)
All the view that are theoretically updatable should be updatable by the system
as well.
Views are the virtual tables created by using queries to show the partial view of
the table. That is views are subset of table, it is only partial table with few rows
and columns. This rule states that views are also be able to get updated as we do
with its table.
Student Student_view
There must be Insert, Delete, Update operations at each level of relations. Set
operation like Union, Intersection and minus should also be supported.
For example:
Suppose employees got 5% hike in a year. Then their salary has to be updated to
reflect the new salary. Since this is the annual hike given to the employees, this
increment is applicable for all the employees. Hence, the query should not be
written for updating the salary one by one for thousands of employee. A single
query should be strong enough to update the entire employee’s salary at a time.
The physical storage of data should not matter to the system. If say, some file
supporting table is renamed or moved from one disk to another, it should not
effect the application.
For example:
If the data stored in one disk is transferred to another disk, then the user viewing
the data should not feel the difference or delay in access time. The user should
be able to access the data as he was accessing before. Similarly, if the file name
for the table is changed in the memory, it should not affect the table or the user
viewing the table. This is known as physical independence and database should
support this feature.
For example:
If we split the EMPLOYEE table according to his department into multiple
employee tables, the user viewing the employee table should not feel that these
records are coming from different tables. These split tables should be able to get
joined and show the result. In our example we can use UNION and display the
results to the user.
But in ideal scenario, this is difficult to achieve since all the logical and user
view will be tied so strongly that they will be almost same.
The database should be able to enforce its own integrity rather than using other
programs. Key and Check constraints, trigger etc, should be stored in Data
Dictionary. This also make RDBMS independent of front-end.
For example:
Suppose we want to insert an employee for department 50 using an application.
But department 50 does not exists in the system. In such case, the application
should not perform the task of fetching if department 50 exists, if not insert the
department and then inserting the employee. It should all handled by the
database.
Rule 11: Distribution Independence
The database can be located at the user server or at any other network. The end
user should not be able to know about the database servers. He should be able to
get the records as if he is pulling the records locally. Even if the database is
located in different servers, the accessibility time should be comparatively less.
For example:
Update Student’s address query should always be converted into low level
language which updates the address record in the student file in the memory. It
should not be updating any other record in the file nor inserting some malicious
record into the file/memory.
Mapping conceptual model to Relational model
X------ Y
The left side of FD is known as a determinant, the right side of the production is
known as a dependent.
Example:
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2)
If X → Y then XZ → YZ
Example:
If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
If X → Y and X → Z then X → YZ
Proof:
X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
Decomposition rule is also known as project rule. It is the reverse of union rule.
If X → YZ then X → Y and X → Z
Proof:
X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
If X → Y and YZ → W then XZ → W
Proof:
X → Y (given)
WY → Z (given)
Anomalies in DBMS
There are three types of anomalies that occur when the database is not
normalized. These are – Insertion, update and deletion anomaly. Let’s take an
example to understand this.
The above table is not normalized. We will see the problems that we face when
a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he
belongs to two departments of the company. If we want to update the address of
Rick then we have to update the same in two rows or the data will become
inconsistent. If somehow, the correct address gets updated in one department
but not in other then as per the database, Rick would be having two different
addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under
training and currently not assigned to any department then we would not be able
to insert the data into the table if emp_dept field doesn’t allow nulls.
To overcome these anomalies we need to normalize the data. In the next section
we will discuss about normalization.
Next →← Prev
Normalization
As per the rule of first normal form, an attribute (column) of a table cannot hold
multiple values. It should hold only atomic values.
Example: Suppose a company wants to store the names and contact details of
its employees. It creates a table that looks like this:
9900012222
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company
stored them in the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have
atomic (single) values”, the emp_mobile values for employees Jon & Lester
violates that rule.
To make the table complies with 1NF we should have the data like this:
Example: Suppose a school wants to store the data of teachers and the subjects
they teach. They create a table that looks like this: Since a teacher can teach
more than one subjects, the table can have multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF
and for each functional dependency X-> Y at least one of the following
conditions hold:
X is a super key of table
Y is a prime attribute of table
employee table:
employee_zip table:
emp_zip emp_state emp_city emp_district
concepts.
databases become lesser in size, the passes through the data becomes faster and
3) Narrower tables are possible as normalized tables will be fine-tuned and will
have lesser columns which allows for more data records per page.
4) Fewer indexes per table ensures faster maintenance tasks (index rebuilds).
5) Also realizes the option of joining only the tables that are needed.
DISADVANTAGES OF NORMALIZATION
1) More tables to join as by spreading out data into more tables, the need to join
table’s increases and the task becomes more tedious. The database becomes
stored as lines of codes rather than the true data. Therefore, there is always a
3) Data model becomes extremely difficult to query against as the data model is
optimized for applications, not for ad hoc querying. (Ad hoc query is a query
friendly query tools.). Hence it is hard to model the database without knowing
4) As the normal form type progresses, the performance becomes slower and
slower.
normalization process efficiently. Careless use may lead to terrible design filled
Decomposition in DBMS
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result
in the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
Now, when these two relations are joined on the common column "EMP_ID",
then the resultant relation will look like:
Employee ⋈ Department
Lossy Decomposition
The decompositions R1, R2, R2…Rn for a relation schema R are said to be
Lossy if there natural join results into additon of extraneous tuples with the the
original relation R.
decomposition.
The common attribute of the sub relation is not a superkey of any of the sub
relation.
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must
satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be derivable
from the combination of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC)
and R2(AD) which is dependency preserving because FD A->BC is a part
of relation R1(ABC).