Unit 2
Unit 2
Course description:
CO 1 Apply the database management
system concepts. This course is designed to introduce under
graduate students to the foundations of
CO 2 Design relational and ER model for
database design. database systems, focusing on basics such as
the relational algebra and data model, schema
CO 3. Examine issues in data storage and query normalization, query optimization, and
processing and frame appropriate solutions.
transactions.
CO 4. Analyze the role and issues like efficiency, privacy,
security, ethical responsibility and strategic advantage in
data management
Abraham Silberschatz, Henry F. Korth and Sudarshan S., Database System Concepts, McGraw-Hill , 6th
Edition, 2011.
Ramez Elmasri and Shamkant B. Navathe. Fundamental Database Systems, Addison-Wesley, 5th
Edition, 2005.
Raghu Ramakrishnan, Database Management System, Tata McGraw-Hill, 3rd Edition, 2006.
Hector Garcia-Molina, Jeff Ulman and Jennifer Widom, Database Systems: The Complete Book,
Prentice Hall, 2003.
tuples
(or rows)
The set of allowed values for each attribute is called the domain of the attribute
Attribute values are (normally) required to be atomic; that is, indivisible
The special value null is a member of every domain. Indicated that the value is “unknown”
The null value causes complications in the definition of many operations
Unordered
Tuples
Let K R
K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R)
Example: {ID} and {ID,name} are both superkeys of instructor.
Super Key is the set of all the keys which help to identify rows in a table uniquely.
This means that all those columns of a table than capable of identifying the other columns of that table
uniquely will all be considered super keys.
Super Key is the superset of a candidate key.
The Primary Key of a table is picked from the super key set to be made the table’s identity attribute.
A primary key is a column of a table or a set of columns that helps to identify every record present in that
table uniquely.
There can be only one primary Key in a table.
Also, the primary Key cannot have the same values repeating for any row.
The PRIMARY KEY (PK) constraint put on a column or set of columns will not allow them to have any
null values or any duplicates.
One table can have only one primary key constraint.
Any value in the primary key cannot be changed by any foreign keys (explained below) which refer to it.
Candidate keys are those attributes that uniquely identify rows of a table.
The Primary Key of a table is selected from one of the candidate keys.
So, candidate keys have the same properties as the primary keys.
A foreign key will require each value in a column or set of columns to match the Primary Key of the
referential table.
Foreign keys help to maintain data and referential integrity.
Composite Key is a set of two or more attributes that help identify each tuple in a table uniquely.
The attributes in the set may not be unique when considered separately.
Unique Key is a column or set of columns that uniquely identify each record in a table.
A unique Key differs from a primary key because it can have only one null value, whereas a primary Key
cannot have any null values.
Topic 4
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
A,C (r)
r s:
Relation r, s:
rs
Note: r s = r – (r – s)
r – s:
r x s (r)
Find the ID, name, dept_name, salary for instructors whose salary is greater than $80,000
{t | t instructor t [salary ] 80000}
Notice that a relation on schema (ID, name, dept_name, salary) is implicitly defined by the query
As in the previous query, but output only the ID attribute value
In domain relational calculus the filtering variable uses the domain of attributes.
It uses the same operators as tuple calculus.
A nonprocedural query language equivalent in power to the tuple relational calculus
Each query is an expression of the form:
Find the ID, name, dept_name, salary for instructors whose salary is greater than $80,000
{< i, n, d, s> | < i, n, d, s> instructor s 80000}
Find the names of all instructors whose department is in the Watson building
{< n > | i, d, s (< i, n, d, s > instructor
b, a (< d, b, a> department b = “Watson” ))}
Topic 2
Project operation
Relational Query Languages
Topic 3 Relational Algebra
Set Union Tuple Relational Calculus
Topic 4
Domain Relational Calculus
Set Intersection
Topic 5
Set Difference
Topic 6
Topic 7
Domain Relational Calculus
The SQL data-definition language (DDL) allows the specification of information about relations,
including:
The schema for each relation.
The domain of values associated with each attribute.
Integrity constraints
Example:
create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
not null
primary key (A1, ..., An )
foreign key (Am, ..., An ) references r
Example:
create table instructor (
ID char(5),
name varchar(20) not null,
dept_name varchar(20),
salary numeric(8,2),
primary key (ID),
foreign key (dept_name) references department);
primary key declaration on an attribute automatically ensures not null
The where clause specifies conditions that the result must satisfy.
Corresponds to the selection predicate of the relational algebra.
To find all instructors in Comp. Sci. dept
select name
from instructor
where dept_name = ‘Comp. Sci.'
Comparison results can be combined using the logical connectives and, or, and not
To find all instructors in Comp. Sci. dept with salary > 80000
select name
from instructor
where dept_name = ‘Comp. Sci.' and salary > 80000
Topic 3
History of SQL
Create table Data Definition Language
Topic 4 Select Clause
Data Manipulation Language Create Clause
Topic 5:
Data Manipulation Language
WHERE clause
Insert
Delete
Drop
In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual
relations stored in the database.)
Consider a person who needs to know an instructors name and department, but not the salary.
This person should see a relation described, in SQL, by
A view provides a mechanism to hide certain data from the view of certain users.
Any relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is
called a view.
A view is defined using the create view statement which has the form
where <query expression> is any legal SQL expression. The view name is represented by v.
Once a view is defined, the view name can be used to refer to the virtual relation that the view generates.
View definition is not the same as creating a new relation by evaluating the query expression.
Rather, a view definition causes the saving of an expression; the expression is substituted into queries
using the view.
A B C A B B C
1 A 1 1 A
2 B 2 2 B
r A,B(r) B,C(r)
A B C
1 A
2 B
Suppose a manufacturing company stores the employee details in a table named employee that has four
attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for
storing employee’s address and emp_dept for storing the department details in which the employee works
The above table is not normalized. We will see the problems that we face when a table is not normalized.
Atomicity is actually a property of how the elements of the domain are used.
Example: Strings would normally be considered indivisible
Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127
If the first two characters are extracted to find the department, the domain of roll numbers is not atomic.
Doing so is a bad idea: leads to encoding of information in application program rather than in the database.
A relation is said to be in second normal form, it should satisfy first normal form.
Non-key attributes are fully functional dependent on the primary key in the second normal form.
Example: A teacher can teach more than one subject.
In the given table the non-prime attribute TEACHER_AGE depends on TEACHER_ID which is a
proper subset of candidate key. So it violates the property of 2NF.
To convert the given table to 2NF, the given table is decomposed into two tables:
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in third normal
form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial
function dependency X → Y.
X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.