DBMS Notes
DBMS Notes
Unit-3
The form of a basic SQL query:
The basic form of an SQL query is as follows:
29
The following SQL statement returns the cities (only distinct values) from both the "Customers"
and the "Suppliers" table:
SELECT City FROM Customers
UNION
30
Nested Queries:
Non-Correlated Nested Queries: The inner subquery has been completely independent of the
outer query.
31
The EXISTS operator is another set comparison operator, such as IN. It allows us to test whether
a set is nonempty. Thus, for each Sailor row S, we test whether the set of Reserves rows R such
that R.bid = 103 AND S.sid = R.sid is nonempty. If so, sailor S has reserved boat 103, and we
retrieve the name. The subquery clearly depends on the current row S and must be re-evaluated
for each row in Sailors. The occurrence of S in the subquery (in the form of the literal S.sid) is
called a correlation, and such queries are called correlated queries.
The inner query is dependent on outer query.
Aggregate operators:
In addition to simply retrieving data, we often want to perform some computation or
summarization. As we noted earlier in this chapter, SQL allows the use of arithmetic expressions.
We now consider a powerful class of constructs for computing aggregate values such as MIN and
SUM. These features represent a significant extension of relational algebra. SQL supports five
aggregate operations, which can be applied on any
column, say A, of a relation:
1. COUNT ([DISTINCT] A): The number of (unique) values in the A column.
2. SUM ([DISTINCT] A): The sum of all (unique) values in the A column.
3. AVG ([DISTINCT] A): The average of all (unique) values in the A column.
4. MAX (A): The maximum value in the A column.
5. MIN (A): The minimum value in the A column.
32
Find the age of the youngest sailor for each rating level.
Null Values:
Outer Joins:
What are SQL JOINs?
In SQL, JOINs are used to unite the rows of two or more tables, based on a column that is shared
between them.
There are four different types of JOINs: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL
OUTER JOIN. In this article, we will be discussing FULL OUTER JOIN.
What is a Full Outer Join in SQL?
The FULL OUTER JOIN (aka OUTER JOIN) is used to return all of the records that have values
in either the left or right table.
For example, a full outer join of a table of customers and a table of orders might return all
customers, including those without any orders, as well as all of the orders. Customers who have
made orders would be united with their orders using their customer id number.
A full outer join can return a lot of data, so before you use it, consider whether a more
conservative method might meet your needs.
Introduction:
Schema refinement is intended to address and a refinement approach based on decompositions.
Redundant storage of information is the root cause of these problems.
Purpose of Normalization:
Database Normalization is a technique of organizing the data in the database. undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts
data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
• Eliminating redundant (useless) data.
• Ensuring data dependencies make sense i.e. data is logically stored.
Problems caused by redundancy
Storing the same information redundantly, that is, in more than one place within a database, can
lead to several problems.
Update anomalies − If data items are scattered and are not linked to each other properly, then it
could lead to strange situations. For example, when we try to update one data item having its
copies scattered over several places, a few instances get updated properly while a few others are
left with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.
tuples must have to have same values for attributes B1, B2, ..., Bn. Functional dependency is
represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-
hand side attributes determine the values of attributes on the right-hand side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all
functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when
applied repeatedly, generates a closure of functional dependencies.
Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
Transitivity rule − if a → b holds and b → c holds, then a → c also holds. a → b is called as a
functionally that determines b.
Trivial Functional Dependency
Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is
called a trivial FD. Trivial FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial
FD.
Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a
completely non-trivial FD.
Normalization Rule: Normalization rule are divided into following normal forms.
1. First normal form
2. Second normal form
3. Third normal form
4. Boyce Codd Normal form (BCNF)
5. Forth normal form.
Two employees (Brock & Cold) are having two mobile numbers so the company stored them in
the same field as you can see in the table above. This table is not in 1NF. To make the table with
1NF.
35
So, we have to Rearrange the table to convert into first normal form
Example: Suppose a school wants to store the data of teachers and the subjects they teach.
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because
non-prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of
candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent
on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
Teacher_details table
Teacher_id Teacher_age
111 38
222 38
333 40
Teacher_subject table
Teacher_id Subject
36
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with the second normal form (2NF).
In other words, 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
X is a super key of table
Y is a prime attribute of table
Example: Consider a table with the following fields, suppose a company wants to store the
complete address of each employee.
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys. Here, emp_state, emp_city&emp_district dependent on emp_zip. And, emp_zip is
dependent on emp_id that makes non-prime attributes (emp_state, emp_city&emp_district)
transitively dependent on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency:
employee table
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
37
employee_zip table
emp_zip emp_state emp_city emp_district
282005 UP Agra DayalBagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan
Example: Suppose there is a company wherein employees work in more than one department.
They store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian Production and planning D001 200
1001 Austrian stores D001 250
1002 American design and technical support D134 100
1002 American Purchasing department D134 600
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table
emp_dept dept_type dept_no_of_emp
Production and planning D001 200
stores D001 250
design and technical support D134 100
Purchasing department D134 600
38
emp_dept_mapping table
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id ->emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
This is now in BCNF as in both the functional dependencies left side part is a key.
Decomposition:
The decomposition of a relation schema R consists of replacing the relation schema by two (or
more) relation schemas that each contain a subset of the attributes of R and together include all
39
attributes in R. Intuitively, we want to store the information in any given instance of R by storing
projections of the instance.
Problems related to Decomposition:
1. Lossless join decomposition 2. Dependency preserving decomposition
Thus R’=R
Then it is lossless decomposition otherwise Lossy decomposition.
For Example:
Let the relation R and its attributes i.e. R (A, B, C, D, E) having functional dependencies.
Relation R is decomposed into R1 and R2 and its attributes (B, C, D) and (A, C, E)
R1(B, C, D) R2(A, C, E)
A B C D E
R1
R2
Now the table is filled with the two schemas R1 and R2.
A B C D E
R1 α α α
R2 α α α
40
If at least one row completely filled with α then we can say this sub schema can produce/ after all
natural join of these two sub schemas can produce the same relation R.
Now we have to add a variable ‘α’ and fill up the table by following these conditions.
Two or more rows have ‘α’ at left hand side of functional dependency.
1. One or more rows have ‘α’ at right hand side of functional dependency.
2. At least one row has a non ‘α’ at right hand side of functional dependency.
A B C D E
R1 α α α α α
R2 α α α
Here one row is completely filled with ‘α’ that’s why it is called as Lossless decomposition.
If the closure of set of functional dependencies of individual relations R1, R2, R3, …, Rn are equal
to the set of functional dependencies of the main relation R (before decomposition), then we would
say the decomposition D is lossless dependency preserving decomposition.
R1(A, B, C), R2(C, D), The FDs A→B, and B→C are hold in R1,The FD C→D holds in R2.
All the functional dependencies hold here. Hence, this decomposition is dependency preserving.
For example: Consider a bike manufacture company, which produces two colours (Black and
white) in each model every year.
Here columns manuf_year and color are independent of each other and dependent on bike_model.
In this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:
bike_model ->>manuf_year
bike_model ->>color
Denormalization:
Denormalization is a database optimization technique where we add redundant data in the database
to get rid of the complex join operations. This is done to speed up database access speed.
Denormalization is done after normalization for improving the performance of the database. The
data from one table is included in another table to reduce the number of joins in the query and
hence helps in speeding up the performance.
Example: Suppose after normalization we have two tables first, Student table and second, Branch
table. The student has the attributes as Roll_no, Student-name, Age, and Branch_id.
42
The branch table is related to the Student table with Branch_id as the foreign key in the Student
table.
If we want the name of students along with the name of the branch name then we need to perform
a join operation. The problem here is that if the table is large we need a lot of time to perform the
join operations. So, we can add the data of Branch_name from Branch table to the Student table
and this will help in reducing the time that would have been used in join operation and thus
optimize the database.
Advantages of Denormalization:
• Query execution is fast since we have to join fewer tables.
Disadvantages of Denormalization:
• As data redundancy is there, update and insert operations are more expensive and take more
time. Since we are not performing normalization, so this will result in redundant data.
• Data Integrity is not maintained in denormalization. As there is redundancy so data can be
inconsistent.