0% found this document useful (0 votes)
17 views15 pages

DBMS Notes

DBMS Unit wise notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

DBMS Notes

DBMS Unit wise notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

28

Unit-3
The form of a basic SQL query:
The basic form of an SQL query is as follows:
29

Union, Intersect and Except:


The UNION operator is used to combine the result-set of two or more SELECT statements.
• Every SELECT statement within UNION must have the same number of columns
• The columns must also have similar data types
• The columns in every SELECT statement must also be in the same order
Union Syntax:
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

The following SQL statement returns the cities (only distinct values) from both the "Customers"
and the "Suppliers" table:
SELECT City FROM Customers
UNION
30

SELECT City FROM Suppliers


ORDER BY City;
Intersect Syntax:
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2;
Except Syntax:
SELECT column_name(s) FROM table1
EXCEPT
SELECT column_name(s) FROM table2;

Nested Queries:
Non-Correlated Nested Queries: The inner subquery has been completely independent of the
outer query.
31

Non-Correlated Nested Queries:

The EXISTS operator is another set comparison operator, such as IN. It allows us to test whether
a set is nonempty. Thus, for each Sailor row S, we test whether the set of Reserves rows R such
that R.bid = 103 AND S.sid = R.sid is nonempty. If so, sailor S has reserved boat 103, and we
retrieve the name. The subquery clearly depends on the current row S and must be re-evaluated
for each row in Sailors. The occurrence of S in the subquery (in the form of the literal S.sid) is
called a correlation, and such queries are called correlated queries.
The inner query is dependent on outer query.

Aggregate operators:
In addition to simply retrieving data, we often want to perform some computation or
summarization. As we noted earlier in this chapter, SQL allows the use of arithmetic expressions.
We now consider a powerful class of constructs for computing aggregate values such as MIN and
SUM. These features represent a significant extension of relational algebra. SQL supports five
aggregate operations, which can be applied on any
column, say A, of a relation:
1. COUNT ([DISTINCT] A): The number of (unique) values in the A column.
2. SUM ([DISTINCT] A): The sum of all (unique) values in the A column.
3. AVG ([DISTINCT] A): The average of all (unique) values in the A column.
4. MAX (A): The maximum value in the A column.
5. MIN (A): The minimum value in the A column.
32

The GROUP BY and HAVING Clauses:


Thus far, we have applied aggregate operations to all (qualifying) rows in a relation. Often we
want to apply aggregate operations to each of a number of groups of rows in a relation, where the
number of groups depends on the relation instance (i.e., is not known in advance).

Find the age of the youngest sailor for each rating level.

Having clause is used to filter out tuples from Group By Clause.

Null Values:

A field with a NULL value is a field with no value.


If a field in a table is optional, it is possible to insert a new record or update a record without
adding a value to this field. Then, the field will be saved with a NULL value.
It is not possible to test for NULL values with comparison operators, such as =, <, or <>.
We will have to use the IS NULL and IS NOT NULL operators instead.
IS NULL Syntax:
SELECT column_names
FROM table_name
WHERE column_name IS NULL;
33

IS NOT NULL Syntax:


SELECT column_names
FROM table_name
WHERE column_name IS NOT NULL;

Outer Joins:
What are SQL JOINs?
In SQL, JOINs are used to unite the rows of two or more tables, based on a column that is shared
between them.
There are four different types of JOINs: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL
OUTER JOIN. In this article, we will be discussing FULL OUTER JOIN.
What is a Full Outer Join in SQL?
The FULL OUTER JOIN (aka OUTER JOIN) is used to return all of the records that have values
in either the left or right table.
For example, a full outer join of a table of customers and a table of orders might return all
customers, including those without any orders, as well as all of the orders. Customers who have
made orders would be united with their orders using their customer id number.
A full outer join can return a lot of data, so before you use it, consider whether a more
conservative method might meet your needs.

Introduction:
Schema refinement is intended to address and a refinement approach based on decompositions.
Redundant storage of information is the root cause of these problems.
Purpose of Normalization:
Database Normalization is a technique of organizing the data in the database. undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts
data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
• Eliminating redundant (useless) data.
• Ensuring data dependencies make sense i.e. data is logically stored.
Problems caused by redundancy
Storing the same information redundantly, that is, in more than one place within a database, can
lead to several problems.
Update anomalies − If data items are scattered and are not linked to each other properly, then it
could lead to strange situations. For example, when we try to update one data item having its
copies scattered over several places, a few instances get updated properly while a few others are
left with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.

Functional Dependency (FD):


Functional dependency is a set of constraints between two attributes in a relation. Functional
dependency says that if two tuples have same values for attributes A1, A2..., An, then those two
34

tuples must have to have same values for attributes B1, B2, ..., Bn. Functional dependency is
represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-
hand side attributes determine the values of attributes on the right-hand side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all
functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when
applied repeatedly, generates a closure of functional dependencies.
Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
Transitivity rule − if a → b holds and b → c holds, then a → c also holds. a → b is called as a
functionally that determines b.
Trivial Functional Dependency
Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is
called a trivial FD. Trivial FDs always hold.
Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial
FD.
Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a
completely non-trivial FD.

Normalization Rule: Normalization rule are divided into following normal forms.
1. First normal form
2. Second normal form
3. Third normal form
4. Boyce Codd Normal form (BCNF)
5. Forth normal form.

First Normal Form:


First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all
the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.
Example: Suppose a company wants to store the names and contact details of its employees. It
creates a table that looks like this:

Emp_id Emp_name Emp_address Emp_mobile


101 Rock Delhi 9988993311
9903388138
102 Brock Vizag
7733223311
103 Stone Mumbai 1122334455
8899778822
104 Cold Chennai
8899993322

Two employees (Brock & Cold) are having two mobile numbers so the company stored them in
the same field as you can see in the table above. This table is not in 1NF. To make the table with
1NF.
35

So, we have to Rearrange the table to convert into first normal form

Emp_id Emp_name Emp_address Emp_mobile


101 Rock Delhi 9988993311
102 Brock Vizag 9903388138
102 Brock Vizag 7733223311
103 Stone Mumbai 1122334455
104 Cold Chennai 8899778822
104 Cold Chennai 8899993322

Second Normal form:


A table is said to be in 2NF if both the following conditions hold:
• Table is in 1NF (First normal form).
• There must not be any partial dependency of any column on primary key, it means that
every non-prime attribute must be fully functionally dependent on prime key attribute.

Example: Suppose a school wants to store the data of teachers and the subjects they teach.

Teacher_id Age Subject


111 38 Maths
111 38 Physics
222 38 Biology
333 40 Physics
333 40 chemistry

Candidate keys: [teacher_id, subject]


Non-prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because
non-prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of
candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent
on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:

Teacher_details table
Teacher_id Teacher_age
111 38
222 38
333 40

Teacher_subject table
Teacher_id Subject
36

111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry

Now the tables comply with the second normal form (2NF).

Third Normal form:


A table design is said to be in 3NF if both the following conditions hold:
• Table must be in 2NF
• Transitive functional dependency of non-prime attribute on any super key should be
removed.
• An attribute that is not part of any candidate key is known as non-prime attribute.

In other words, 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
X is a super key of table
Y is a prime attribute of table
Example: Consider a table with the following fields, suppose a company wants to store the
complete address of each employee.

emp_id emp_name emp_zip emp_state emp_city emp_district


1001 John 282005 UP Agra DayalBagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on

Candidate Keys: {emp_id}

Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys. Here, emp_state, emp_city&emp_district dependent on emp_zip. And, emp_zip is
dependent on emp_id that makes non-prime attributes (emp_state, emp_city&emp_district)
transitively dependent on super key (emp_id). This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency:

employee table
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
37

1006 Lora 282007


1101 Lilly 292008
1201 Steve 222999

employee_zip table
emp_zip emp_state emp_city emp_district
282005 UP Agra DayalBagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF):


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF.
A table complies with BCNF following conditions must be satisfied:
• R must be in 3rd Normal Form and, for each functional dependency (X -> Y), X should
be a super Key.

Example: Suppose there is a company wherein employees work in more than one department.
They store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian Production and planning D001 200
1001 Austrian stores D001 250
1002 American design and technical support D134 100
1002 American Purchasing department D134 600

Functional dependencies in the table above:


emp_id ->emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:

emp_nationality table
emp_id emp_nationality
1001 Austrian
1002 American

emp_dept table
emp_dept dept_type dept_no_of_emp
Production and planning D001 200
stores D001 250
design and technical support D134 100
Purchasing department D134 600
38

emp_dept_mapping table
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department

Functional dependencies:
emp_id ->emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys: For first table: emp_id

For second table: emp_dept

For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.

Concept of Surrogate Key:


A surrogate key is any column or set of columns that can be declared as the primary key instead
of a "real" or natural key. Sometimes there can be several natural keys that could be declared as
the primary key, and these are all called candidate keys. So, a surrogate is a candidate key. A table
could actually have more than one surrogate key, although this would be unusual. The most
common type of surrogate key is an incrementing integer, such as an auto_increment column in
MySQL, or a sequence in Oracle, or an identity column in SQL Server.
In general surrogate key is nothing but a new field with artificially-generated values whose sole
purpose is to be used as a primary key. This is called a surrogate key.
A surrogate key has the following characteristics:
1) It is typically an integer.
2) It has no meaning. You will not be able to know the meaning of that row of data based on
the surrogate key value.
3) It is not visible to end users. End users should not see a surrogate key in a report.
4)
The following is the table named PET_OWNER table with Owner Id as a surrogate key.

Decomposition:
The decomposition of a relation schema R consists of replacing the relation schema by two (or
more) relation schemas that each contain a subset of the attributes of R and together include all
39

attributes in R. Intuitively, we want to store the information in any given instance of R by storing
projections of the instance.
Problems related to Decomposition:
1. Lossless join decomposition 2. Dependency preserving decomposition

2. Lossless Join Decomposition:


Let R be a relation schema and F be a set of Functional dependencies over R. A decomposition
{R1, R2……. Rn} of a relation R is called a lossless decomposition for R, if the natural join of
R1, R2……...Rn produces exactly the same relation R.

A decomposition is lossless if we can recover


R (A, B, C)
Decompose
R1(A, B) R2(A,C)
Recover
R’(A,B,C)

Thus R’=R
Then it is lossless decomposition otherwise Lossy decomposition.

For Example:
Let the relation R and its attributes i.e. R (A, B, C, D, E) having functional dependencies.

F= {AB→C, C→E, B→D, E→A}

Relation R is decomposed into R1 and R2 and its attributes (B, C, D) and (A, C, E)

Now we have to determine where the decomposition of R1 and R2 is Lossless or Lossy


decomposition.

R1(B, C, D) R2(A, C, E)

Consider a distinguishable variable ‘α' to fill up the table.

A B C D E
R1
R2

Now the table is filled with the two schemas R1 and R2.

A B C D E
R1 α α α
R2 α α α
40

If at least one row completely filled with α then we can say this sub schema can produce/ after all
natural join of these two sub schemas can produce the same relation R.

Now we have to add a variable ‘α’ and fill up the table by following these conditions.
Two or more rows have ‘α’ at left hand side of functional dependency.
1. One or more rows have ‘α’ at right hand side of functional dependency.
2. At least one row has a non ‘α’ at right hand side of functional dependency.

According these conditions, the table becomes

A B C D E
R1 α α α α α
R2 α α α

Here one row is completely filled with ‘α’ that’s why it is called as Lossless decomposition.

Dependency Preserving Decomposition:


A decomposition of a relation R into R1, R2, R3, …, Rn is dependency preserving
decomposition with respect to the set of Functional Dependencies F that hold on R only if the
following is hold;
(F1 U F2 U F3 U … U Fn)+ = F+ where,
F1, F2, F3, …, Fn – Sets of Functional dependencies of relations R1, R2, R3, …, Rn.

(F1 U F2 U F3 U … U Fn)+ - Closure of Union of all sets of functional dependencies.

F+ - Closure of set of functional dependency F of R.

If the closure of set of functional dependencies of individual relations R1, R2, R3, …, Rn are equal
to the set of functional dependencies of the main relation R (before decomposition), then we would
say the decomposition D is lossless dependency preserving decomposition.

Example: Assume R(A, B, C, D) with FDs A→B, B→C, C→D.

Let us decompose R into R1 and R2 as follows;

R1(A, B, C), R2(C, D), The FDs A→B, and B→C are hold in R1,The FD C→D holds in R2.

All the functional dependencies hold here. Hence, this decomposition is dependency preserving.

Fourth Normal Form:


A relation is said to be in the fourth normal form if it is already in Boyce-Codd normal form and
it has no multivalued dependency.
It builds on the first three normal forms (1NF, 2NF and 3NF) and the Boyce-Codd Normal Form
(BCNF). It states that, in addition to a database meeting the requirements of BCNF, it must not
contain more than one multivalued dependency.
41

Multivalued Dependency: MVD represents a dependency between attributes. Multivalued


dependency occurs when there are more than oneindependent multivalued attributes in a table.

For example: Consider a bike manufacture company, which produces two colours (Black and
white) in each model every year.

Bike model Manuf year Color


M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red

Here columns manuf_year and color are independent of each other and dependent on bike_model.
In this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:

bike_model ->>manuf_year
bike_model ->>color

Denormalization:
Denormalization is a database optimization technique where we add redundant data in the database
to get rid of the complex join operations. This is done to speed up database access speed.
Denormalization is done after normalization for improving the performance of the database. The
data from one table is included in another table to reduce the number of joins in the query and
hence helps in speeding up the performance.
Example: Suppose after normalization we have two tables first, Student table and second, Branch
table. The student has the attributes as Roll_no, Student-name, Age, and Branch_id.
42

The branch table is related to the Student table with Branch_id as the foreign key in the Student
table.

If we want the name of students along with the name of the branch name then we need to perform
a join operation. The problem here is that if the table is large we need a lot of time to perform the
join operations. So, we can add the data of Branch_name from Branch table to the Student table
and this will help in reducing the time that would have been used in join operation and thus
optimize the database.
Advantages of Denormalization:
• Query execution is fast since we have to join fewer tables.
Disadvantages of Denormalization:
• As data redundancy is there, update and insert operations are more expensive and take more
time. Since we are not performing normalization, so this will result in redundant data.
• Data Integrity is not maintained in denormalization. As there is redundancy so data can be
inconsistent.

You might also like