0% found this document useful (0 votes)
18 views22 pages

M2 Notes

The document provides an overview of the relational model and relational algebra, detailing concepts such as relations, tuples, attributes, and domains. It explains the characteristics of relations, including ordering, NULL values, and interpretation, as well as constraints like entity integrity and referential integrity. Additionally, it introduces relational algebra operations like SELECT and PROJECT for manipulating relational data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views22 pages

M2 Notes

The document provides an overview of the relational model and relational algebra, detailing concepts such as relations, tuples, attributes, and domains. It explains the characteristics of relations, including ordering, NULL values, and interpretation, as well as constraints like entity integrity and referential integrity. Additionally, it introduces relational algebra operations like SELECT and PROJECT for manipulating relational data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

22CML42

MODULE 2

Relational Model and Relational Algebra


Mapping Conceptual Design into Logical Design

22CML42

Dept. of CSE(AIML), GAT 1


22CML42

Relational Model:
Relational Model Concepts
• The relational model represents the database as a collection of relations.
• Informally, each relation resembles a table of values or, to some extent, a flat file of records
• A relation is thought of as a table of values, each row in the table represents a collection of related data
values.
• A row represents a fact that typically corresponds to a real-world entity or relationship. The table name
and column names are used to help to interpret the meaning of the values in each row.
• In the formal relational model terminology,
o a row →a tuple, a column header →an attribute, and the table →a relation. The data type describing
the types of values that can appear in each column is represented by a domain of possible values.

Domains, Attributes, Tuples, and Relations

• A domain D is a set of atomic values. By atomic means each value in the domain is indivisible in
formal relational model. A common method of specifying a domain is to specify a data type from which
the data values forming the domain are drawn.
• Some examples of domains follow:
USA_phone_number: string of digits of length ten SSN: string of digits of length nine
Name: string of characters beginning with an upper case letter GPA: a real number between 0.0 and 4.0
Gender: a member of the set { female, male }
Dept_Code: a member of the set { CMPS, MATH, ENGL, PHYS, PSYC, ... }

✓ A relation schema R, denoted by R(A1, A2, … , An), is made up of a relation name R and a list of
attributes, A1, A2, … , An.
✓ Attribute: Ai is the name of a role played by some domain D in the relation schema R. D is called the
domain of Ai and is denoted by dom(Ai).
✓ Tuple: A tuple is a mapping from attributes to values drawn from the respective domains of those
attributes. A tuple is intended to describe some entity (or relationship between entities) in the miniworld.
✓ R is called the name of this relation.
✓ The degree (or arity) of a relation is the number of attributes n of its relation schema.
✓ A relation of degree seven, which stores information about university students, would contain seven
attributes describing each student as follows:
STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, Gpa)
✓ Relational Database: A collection of relations, each one consistent with its specified relational schema.
✓ A relation (or relation state) r of the relation schema R(A1, A2, … , An), also denoted by r(R), is a set
of n-tuples r = {t1, t2, … , tm}. Each n-tuple t is an ordered list of n values t =<v1,v2…,vn>

Dept. of CSE(AIML), GAT 2


22CML42

Characteristics of Relations

1. Ordering of Tuples in a Relation


A relation is defined as a set of tuples. Mathematically, elements of a set have no order among them;
hence, tuples in a relation do not have any particular order.Similarly, when tuples are represented on a
storage device, they must be organized in some fashion, and it may be advantageous, from a performance
standpoint, to organize them in a way that depends upon their content.

2. Ordering of Values within a Tuple


The order of attributes and their values is not that important as long as the correspondence between
attributes and values is maintained.
A tuple can be considered as a set of (<attribute>,<value> ) pairs, where each pair gives the value of
the mapping from an attribute Ai to a value vi from dom(Ai). The ordering of attributes is not important,
because the attribute name appears with its value.

3. Values and NULLs in the Tuples


Each value in a tuple is an atomic value; that is, it is not divisible into components.
An important concept is NULL values, which are used to represent the values of attributes that may
be unknown or may not apply to a tuple.
NULL values has several meanings, such as value unknown, value exists but is not available, or
attributedoes not apply to this tuple.

4. Interpretation (Meaning) of a Relation


Each tuple in the relation can then be interpreted as a fact or a particular instance of the assertion.
Each relation can be viewed as a predicate and each tuple in that relation can be viewed as an
assertion for which that predicate is satisfied (i.e., has value true) for the combination of values in it.
Example:There exists a student having name Benjamin Bayer, having SSN 305-61-2435, having age
19, etc

Relational Model Notation

Dept. of CSE(AIML), GAT 3


22CML42

The following notation are used for presentation:


• A relation schema R of degree n is denoted by R(A1, A2, … , An).
• The uppercase letters Q, R, S denote relation names.
• The lowercase letters q, r, s denote relation states.
• The letters t, u, v denote tuples.
• In general, the name of a relation schema such as STUDENT also indicates the current set of tuples in that
relation—the current relation state—whereas STUDENT(Name, Ssn, …) refers only to the relation
schema.
• An attribute A can be qualified with the relation name R to which it belongs by using the dot notation
R.A—for example, STUDENT.Name or STUDENT.Age. This is because the same name may be used for
two attributes in different relations.
• An n-tuple t in a relation r(R) is denoted by t =<v1,v2, ….., vn> , where vi is the value corresponding to
attribute Ai. The following notation refers to component values of tuples:
• Both t[Ai] and t.Ai (and sometimes t[i]) refer to the value vi in t for attribute Ai.
• Both t[Au, Aw, … , Az] and t.(Au, Aw, … , Az), where Au, Aw, … , Az is a list of attributes from
R, refer to the subtuple of values from t corresponding to the attributes specified in the list.

Relational Model Constraints and Relational Database Schemas


Relational ModelConstraints on databases can generally be divided into three main categories:
1. Constraints that are inherent in the data model known as inherent model-based constraints or implicit
constraints.
2. Constraints that can be directly expressed in the schemas of the data model, typically by specifying them in the
DDL known as schema-based constraints or explicit constraints.
3. Constraints that cannot be directly expressed in the schemas of the data model, and hence must be expressed
and enforced by the application programs or in some other way known as application-based or semantic
constraints or business rules.

The schema-based constraints include domain constraints, key constraints, constraints on NULLs, entity
integrity constraints, and referential integrity constraints.

1. Domain Constraints
✓ Domain constraints specify that within each tuple, the value of each attribute A must be an atomic value
from the domain dom(A).
✓ The data types associated with domains typically include standard numeric data types for integers and real
numbers. Characters, Booleans, fixed-length strings, and variable-length strings are also available, as are
date, time, timestamp, and other special data types.

2. Key Constraints and Constraints on NULL Values


✓ In the formal relational model, a relation is defined as a set of tuples.
✓ By definition, all elements of a set are distinct; hence, all tuples in a relation must also be distinct.
✓ This means that no two tuples can have the same combination of values for all their attributes. Usually,
there are other subsets of attributes of a relation schema R with the property that no two tuples in any relation
state r of R should have the same combination of values for these attributes.
✓ Suppose that we denote one such subset of attributes by SK; then for any two distinct tuples t1 and t2 in a
relation state r of R, we have the constraint that:
t1[SK] ≠ t2[SK]
Dept. of CSE(AIML), GAT 4
22CML42

SuperKey
• A superkey SK specifies a uniqueness constraint that no two distinct tuples in any state r of R can have
the same value for SK.
• A key k of a relation schema R is a superkey of R with the additional property that removing any
attribute A from K leaves a set of attributes K′ that is not a superkey of R any more.
Hence, a key satisfies two properties:
1. Two distinct tuples in any state of the relation cannot have identical values for (all) the attributes in the
key. This uniqueness property also applies to a superkey.
2. It is a minimal superkey—that is, a superkey from which we cannot remove any attributes and still have
the uniqueness constraint hold. This minimality property is required for a key but is optional for a superkey.

Candidate key
A relation schema may have more than one key. In this case, each of the keys is called a
candidate key. For example, the CAR relation has two candidate keys: License_number and
Engine_serial_number

Primary key
It is common to designate one of the candidate keys as the primary key of the relation. This
is the candidate key whose values are used to identify tuples in the relation. We use the
convention that the attributes that form the primary key of a relation schema are underlined.
Other candidate keys are designated as unique keys and are not underlined

3. Relational Databases and Relational Database Schemas


✓ A relational database is a collection of many relations.
✓ A relational database schema S is a set of relation schemas S = {R1, R2, … , Rm} and a set of integrity
constraints IC.
✓ A relational database state DB of S is a set of relation states DB = {r1, r2, … ,rm} such that each ri is
a state of Ri and such that the ri relation states satisfy the integrity constraints specified in IC.
✓ A relational database schema that we call COMPANY = {EMPLOYEE, DEPARTMENT,
DEPT_LOCATIONS, PROJECT, WORKS_ON, DEPENDENT}.
✓ When we refer to a relational database, we implicitly include both its schema and its current state. A
database state that does not obey all the integrity constraints is called not valid, and a state that satisfies
all the constraints in the defined set of integrity constraints IC is called a valid state.

4. Entity Integrity, Referential Integrity, and Foreign Keys

Entity Integrity Constraint


Dept. of CSE(AIML), GAT 5
22CML42

• The entity integrity constraint states that no primary key value can be NULL. This is because the
primary key value is used to identify individual tuples in a relation.
• Key constraints and entity integrity constraints are specified on individual relations.
Referential Integrity Constraint
• The referential integrity constraint is specified between two relations and is used to maintain the
consistency among tuples in the two relations.
• Informally, the referential integrity constraint states that a tuple in one relation that refers to
another relation must refer to an existing tuple in that relation.
• For example, the attribute Dno of EMPLOYEE gives the department number for which each
employee works; hence, its value in every EMPLOYEE tuple must match the Dnumber value of
some tuple in the DEPARTMENT relation.
Foreign key
• The conditions for a foreign key, given below, specify a referential integrity constraint between
the two relation schemas R1 and R2.
A set of attributes FK in relation schema R1 is a foreign key of R1 that references relation R2 if it
satisfies the following rules:
1. The attributes in FK have the same domain(s) as the primary key attributes PK of R2; the attributes
FK are said to reference or refer to the relation R2.
2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK for some tuple
t2 in the current state r2(R2) or is NULL. In the former case, we have t1[FK] = t2[PK], and we say that
the tuple t1 references or refers to the tuple t2.
In this definition, R1 is called the referencing relation and R2 is the referenced relation. If these two
conditions hold, a referential integrity constraint from R1 to R2 is said to hold.
In the EMPLOYEE relation, the attribute Dno refers to the department for which an employee works;
hence, it is designated Dno to be a foreign key of EMPLOYEE referencing the DEPARTMENT relation.

This means that a value of Dno in any tuple t1 of the EMPLOYEE relation must match a value of
Constraints the primary key of DEPARTMENT—the Dnumber attribute—in some tuple t2 of the
DEPARTMENT relation, or the value of Dno can be NULL if the employee does not belong to a department
or will be assigned to a department later.

Dept. of CSE(AIML), GAT 6


22CML42

RELATIONAL ALGEBRA
Unary and Binary relational operations
SELECT and PROJECT
The SELECT Operation

The SELECT operation is used to choose a subset of the tuples from a relation that satisfies a selection
condition.
It restricts the tuples in a relation to only those tuples that satisfy the condition.
It can also be visualized as a horizontal partition of the relation into two sets of tuples—those tuples that
satisfy the condition and are selected, and those tuples that do not satisfy the condition and are discarded.
For example, to select the EMPLOYEE tuples whose department is 4, or those whose salary is greater
than $30,000

σDno=4(EMPLOYEE)
σSalary>30000(EMPLOYEE)

In general, the SELECT operation is denoted by


σ<selection condition>(R)

where the symbol σ (sigma) is used to denote the SELECT operator and the selection condition is a
Boolean expression (condition) specified on the attributes of relation R.
The Boolean expression specified in is made up of a number of clauses of the form :
<attribute name><comparison op><constant value>
Or
<attribute name><comparison op><attribute name>
Clauses can be connected by the standard Boolean operators and, or, and not to form a general
selection condition.
For example, to select the tuples for all employees who either work in department 4 and
make over
$25,000 per year, or work in department 5 and make over $30,000:

σ(Dno=4 AND Salary>25000) OR (Dno=5 AND


Salary>30000)(EMPLOYEE)

Dept. of CSE(AIML), GAT 7


22CML42

• The Boolean conditions AND, OR, and NOT have their normal interpretation, as follows:
• (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; otherwise,it is FALSE.
• (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE; otherwise, it is FALSE.
• (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.
• The SELECT operator is unary; that is, it is applied to a single relation. Hence, selection
conditions cannot involve more than one tuple.
• The degree of the relation resulting from a SELECT operation—its number of attributes—is the
same as the degree of R.
• The SELECT operation is commutative; that is,
σ (cond1)(σ(cond2)(R)) = σ(cond2)(σ(cond1)(R))
The PROJECT Operation

• The PROJECT operation, selects certain columns from the table and discards the other columns.
• The result of the PROJECT operation can be visualized as a vertical partition of the relation into two
relations: one has the needed columns (attributes) and contains the result of the operation, and the
other contains the discarded columns.
• For example, to list each employee’s first and last name and salary, we can use the PROJECT
operation as follows:
πLname, Fname, Salary(EMPLOYEE)
The general form of the PROJECT operation is
π<attribute list>(R)
o where (pi) is the symbol used to represent the PROJECT operation, and is the desired sublist of
attributes from the attributes of relation R.
o The result of the PROJECT operation has only the attributes specified in in the same order as they
appear in the list. Hence, its degree is equal to the number of attributes in <attribute list>.
o The PROJECT operation removes any duplicate tuples, so the result of the PROJECT operation is
a set of distinct tuples, and hence a valid relation. This is known as duplicate elimination.
Sequences of Operations and the RENAME Operation
• The relations shown above depict operation results do not have any names.
• Either we can write the operations as a single relational algebra expression by nesting the operations, or
we can apply one operation at a time and create intermediate result relations.
• In the latter case, we must give names to the relations that hold the intermediate results.
• For example, to retrieve the first name, last name, and salary of all employees who work in department
number 5, apply a SELECT and a PROJECT operation.

πFname, Lname, Salary(σDno=5(EMPLOYEE))

• Alternatively, we can explicitly show the sequence of operations, giving a name to each intermediate
relation, and using the assignment operation, denoted by ← (left arrow), as follows:

Dept. of CSE(AIML), GAT 8


22CML42

DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT ← πFname, Lname, Salary(DEP5_EMPS)

Dept. of CSE(AIML), GAT 9


22CML42

• It is sometimes simpler to break down a complex sequence of operations by specifying intermediate


result relations than to write a single relational algebra expression.
• We can also use this technique to rename the attributes in the intermediate and result relations.
• To rename the attributes in a relation, we simply list the new attribute names in parentheses, as in the
following example:
TEMP ← σDno=5(EMPLOYEE)
R(First_name, Last_name, Salary) ← πFname, Lname, Salary(TEMP)

• The formal RENAME operation—which can rename either the relation name or the attribute names, or
both—as a unary operator.
• The general RENAME operation when applied to a relation R of degree n is denoted by any of the
following three forms:
• ρS(B1, B2, ... , Bn)(R) or ρS(R) or ρ(B1, B2, ... , Bn)(R)
• where the symbol ρ (rho) is used to denote the RENAME operator, S is the new relation name, and B1,
B2, … , Bn are the new attribute names.
• The first expression renames both the relation and its attributes, the second renames the relation only, and
the third renames the attributes only.
• Example-1: Query to display the roll number and name of Male students from Student relation and
rename it as Male Student and the attributes of Student – RollNo, SName as (Sno, Name).

Relational Algebra Operations from Set Theory


The UNION, INTERSECTION, and MINUS Operations
■ UNION: The result of this operation, denoted by R ∪ S, is a relation that includes all tuples that are either
in R or in S or in both R and S. Duplicate tuples are eliminated.
■ INTERSECTION: The result of this operation, denoted by R ∩ S, is a relation that includes all tuples that
are in both R and S.
■ SET DIFFERENCE (or MINUS): The result of this operation, denoted by R – S, is a relation that includes
all tuples that are in R but not in S.
✓ These are binary operations; that is, each is applied to two sets (of tuples).
✓ Two relations R(A1, A2, … , An) and S(B1, B2, … , Bn) are said to be union compatible (or type
compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1 ≤ i ≤ n. This means that the
two relations have the same number of attributes and each corresponding pair of attributes has the same
domain.
✓ For example, to retrieve the Social Security numbers of all employees who either work in department 5
or directly supervise an employee who works in department 5,
DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT1 ← πSsn(DEP5_EMPS)
RESULT2(Ssn) ← πSuper_ssn(DEP5_EMPS)
RESULT ← RESULT1 ∪ RESULT2

✓ Both UNION and INTERSECTION are commutative operations; that is,


R ∪ S = S ∪ R and R ∩ S = S ∩ R

Dept. of CSE(AIML), GAT 10


22CML42

✓ Both UNION and INTERSECTION can be treated as n-ary operations applicable to any number of
relations because both are also associative operations; that is,
R ∪ (S ∪ T ) = (R ∪ S) ∪ T and (R ∩ S) ∩ T = R ∩ (S ∩ T)
✓ The MINUS operation is not commutative; that is, in general, R − S ≠ S − R
✓ The INTERSECTION can be expressed in terms of union and set difference as follows:
R ∩ S = ((R ∪ S) − (R − S)) − (S − R)

The CARTESIAN PRODUCT (CROSS PRODUCT) Operation


✓ CARTESIAN PRODUCT operation—also known as CROSS PRODUCT or CROSS JOIN—• which is
denoted by ×.
✓ This is also a binary set operation, but the relations on which it is applied do not have to be union
compatible.
✓ This set operation produces a new element by combining every member (tuple) from one relation (set)
with every member (tuple) from the other relation (set).
✓ In general, the result of R(A1, A2, ..., An) × S(B1, B2, ..., Bm) is a relation Q with degree n + m•
attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.
✓ The resulting relation Q has one tuple for each combination of tuples—one from R and one from S.
✓ Hence, if R has m tuples and S has n tuples, then R × S will have m*n tuples.
✓ Example, suppose that we want to retrieve a list of names of each female employee’s dependents. We
can do this as follows:
FEMALE_EMPS ← σSex=‘F’(EMPLOYEE)
EMPNAMES ← πFname, Lname, Ssn(FEMALE_EMPS)
EMP_DEPENDENTS ← EMPNAMES × DEPENDENT
ACTUAL_DEPENDENTS ← σSsn=Essn(EMP_DEPENDENTS)
RESULT ← πFname, Lname, Dependent_name(ACTUAL_DEPENDENTS)

Binary Relational Operations: JOIN and DIVISION


The JOIN Operation
✓ The JOIN operation, denoted by , is used to combine related tuples from two relations into single
“longer” tuples.
✓ This operation is very important for any relational database with more than a single relation because it
allows us to process relationships among relations.
✓ To illustrate JOIN, suppose that we want to retrieve the name of the manager of each department, as
follows:

✓ The general form of a JOIN operation on two relations R(A1, A2, … , An) and S(B1, B2, … , Bm) is
R <join condition>S
✓ The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, … , An, B1, B2, … , Bm) in that
order; Q has one tuple for each combination of tuples—one from R and one from S—whenever the
combination satisfies the join condition.

Dept. of CSE(AIML), GAT 11


22CML42

✓ The main difference between CARTESIAN PRODUCT and JOIN are ,In JOIN, only combinations of
tuples satisfying the join condition appear in the result, whereas in the CARTESIAN PRODUCT all
combinations of tuples are included in the result.
✓ The join condition is specified on attributes from the two relations R and S and is evaluated for each
combination of tuples. Each tuple combination for which the join condition evaluates to TRUE is included
in the resulting relation Q as a single combined tuple.
✓ A general join condition is of the form
<condition>AND<condition> AND … AND<condition>
where each <condition> is of the form Ai θ Bj , Ai is an attribute of R, Bj is an attribute of S, Ai
and Bj have the same domain, and θ (theta) is one of the comparison operators {=, <,>,< , ≥, ≠}.
A JOIN operation with such a general join condition is called a THETA JOIN.

Variations of JOIN:
The EQUIJOIN and NATURAL JOIN
✓ The most common use of JOIN involves join conditions with equality comparisons only. Such a JOIN,
where the only comparison operator used is =, is called an EQUIJOIN.
✓ Notice that in the result of an EQUIJOIN we always have one or more pairs of attributes that haveidentical
values in every tuple.
✓ For example, the values of the attributes Mgr_ssn and Ssn are identical in every tuple of DEPT_MGR (the
EQUIJOIN result) because the equality join condition specified on these two attributes requires the values
to be identical in every tuple in the result.
✓ Because one of each pair of attributes with identical values is superfluous, a new operation called
NATURAL JOIN—denoted by * was created to get rid of the second (superfluous) attribute in an
EQUIJOIN condition.
✓ The standard definition of NATURAL JOIN requires that the two join attributes (or each pair of join
attributes) have the same name in both relations. If this is not the case, a renaming operation is applied first.
✓ Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple that controls the
project. In the following example, first we rename the Dnumber attribute of DEPARTMENT to Dnum—
so that it has the same name as the Dnum attribute in PROJECT—and then we apply NATURAL JOIN:
PROJ_DEPT ← PROJECT * ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)
✓ The same query can be done in two steps by creating an intermediate table DEPT as follows:
DEPT ← ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)
PROJ_DEPT ← PROJECT * DEPT
✓ The attribute Dnum is called the join attribute for the NATURAL JOIN operation, because it is the only
attribute with the same name in both relations.
✓ In general, the join condition for NATURAL JOIN is constructed by equating each pair of join attributes
that have the same name in the two relations and combining these conditions with AND.
✓ A single JOIN operation is used to combine data from two relations so that related information can be
presented in a single table. These operations are also known as inner joins.
✓ A more general, but nonstandard definition for NATURAL JOIN is Q ← R *(list1),(list2)S
✓ In this case,<list1> specifies a list of i attributes from R, and <list2>specifies a list of i attributes from S.
✓ The NATURAL JOIN or EQUIJOIN operation can also be specified among multiple tables, leading to
an n-way join. For example, consider the following three-way join:
((PROJECT Dnum=DnumberDEPARTMENT) Mgr_ssn=SsnEMPLOYEE)

Dept. of CSE(AIML), GAT 12


22CML42

A Complete Set of Relational Algebra Operations


✓ It has been shown that the set of relational algebra operations {σ, π, ∪, ρ, –, ×} is a complete set;• that is,
any of the other original relational algebra operations can be expressed as a sequence of operations from
this set.
✓ For example, the INTERSECTION operation can be expressed by using UNION and MINUS as follows:
R ∩ S ≡ (R ∪ S) – ((R – S) ∪(S – R))
✓ JOIN operation can be specified as a CARTESIAN PRODUCT followed by a SELECT operation:
R <condition>S ≡ σ <condition>(R × S)
✓ A NATURAL JOIN can be specified as a CARTESIAN PRODUCT preceded by RENAME and followed
by SELECT and PROJECT operations.
The DIVISION Operation
✓ The DIVISION operation, denoted by ÷, is useful for a special kind of query that sometimesoccurs in
database applications.
✓ example is Retrieve the names of employees who work on all the projects that ‘John Smith’ works on.
To express this query using the DIVISION operation, proceed as follows. First, retrieve the list of project
numbers that ‘John Smith’ works on in the intermediate relation SMITH_PNOS:
SMITH ← σFname=‘John’ AND Lname=‘Smith’(EMPLOYEE)
SMITH_PNOS ← πPno(WORKS_ON Essn=SsnSMITH)
✓ Next, create a relation that includes a tuple whenever the employee whose Ssn is Essn works on the
project whose number is Pno in the intermediate relation SSN_PNOS:
SSN_PNOS ← πEssn, Pno(WORKS_ON)
✓ Finally, apply the DIVISION operation to the two relations, which gives the desired employees’ Social
Security numbers:
SSNS(Ssn) ← SSN_PNOS ÷ SMITH_PNOS
RESULT ← πFname, Lname(SSNS * EMPLOYEE)
✓ In general, the DIVISION operation is applied to two relations R(Z) ÷ S(X), where the attributes of R are
a subset of the attributes of S; that is, X ⊆ Z. Let Y be the set of attributes of R that are not attributes of S.
✓ The DIVISION operation is defined for convenience for dealing with queries that involve universal
quantification or the all condition.

Dept. of CSE(AIML), GAT 13


22CML42

Notation for Query Trees


✓ Describes about the notation typically used in relational systems to represent queries internally.
✓ The notation is called a query tree or sometimes it is known as a query evaluation tree or query execution
tree.
✓ It includes the relational algebra operations being executed and is used as a possible data structure for the
internal representation of the query in an RDBMS.
✓ A query tree is a tree data structure that corresponds to a relational algebra expression.
✓ It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra
operations as internal nodes.
✓ An execution of the query tree consists of executing an internal node operation whenever its operands
(represented by its child nodes) are available, and then replacing that internal node by the relation that
results from executing the operation.
✓ The execution terminates when the root node is executed and produces the result relation for the query.

πPnumber,Dnum,Lname,Address,Bdate(((σPlocation=‘Stafford’(PROJECT)) Dnum=Dnumber(DEPARTMENT))
Mgr_ssn=Ssn(EMPLOYEE))
Dept. of CSE(AIML), GAT 14
22CML42

✓ Query tree for the abobe query.In this, the three leaf nodes P, D, and E represent the three relations
PROJECT, DEPARTMENT, and EMPLOYEE.

✓ In order to execute query , the node marked (1) in Figure must begin execution before node(2) because
some resulting tuples of operation (1) must be available before we can begin to execute operation (2).
Similarly, node (2) must begin to execute and produce results before node (3) can start execution, and so
on.
✓ In general, a query tree gives a good visual representation and understanding of the query in terms of the
relational operations it uses and is recommended as an additional means for expressing queries in relational
algebra.

Additional Relational Operations (aggregate, grouping, etc.)


Generalized Projection
✓ The generalized projection operation extends the projection operation by allowing functions of attributes
to be included in the projection list.
✓ The generalized form can be expressed as:
πF1, F2, ..., Fn (R)
where F1, F2, … , Fn are functions over the attributes in relation R and may involve arithmetic
operations and constant values.
✓ This operation is helpful when developing reports where computed values have to be produced in the
columns of a query result.
✓ Example: EMPLOYEE (Ssn, Salary, Deduction, Years_service)
A report may be required to show
Net Salary = Salary – Deduction, Bonus = 2000 * Years_service, and Tax = 0.25 * Salary.
Then a generalized projection combined with renaming may be used as follows:
REPORT ← ρ(Ssn, Net_salary, Bonus, Tax)(πSsn, Salary – Deduction, 2000 * Years_service, 0.25 *
Salary(EMPLOYEE))
Aggregate Functions and Grouping

Dept. of CSE(AIML), GAT 15


22CML42

✓ Mathematical aggregate functions on collections of values from the database. Examples of such functions
include retrieving the average or total salary of all employees or the total number of employee tuples.
✓ Common functions applied to collections of numeric values include SUM, AVERAGE, MAXIMUM,
and MINIMUM.
The COUNT function is used for counting tuples or values.
✓ AGGREGATE FUNCTION operation defined, using the symbol ℑ , to specify these types of requests
as follows:
<grouping attribute>ℑ<function list> (R)
Where<grouping attribute> is a list of attributes of the relation specified in R, and <function list>is a list
of (<function><attribute> ) pairs.
✓ In each such pair, is one of the allowed functions—such as SUM, AVERAGE, MAXIMUM,
MINIMUM, COUNT—and is an attribute of the relation specified by R.
✓ The resulting relation has the grouping attributes plus one attribute for each element in the function list.
✓ For example, to retrieve each department number, the number of employees in the department, and their
average salary, while renaming the resulting attributes as indicated below, we write:
ρR(Dno, No_of_employees, Average_sal) (Dno ℑ COUNT Ssn, AVERAGE Salary (EMPLOYEE))
✓ If we do not want to rename the attributes then the above query we can write it as,
Dno ℑ COUNT Ssn, AVERAGE Salary(EMPLOYEE)
✓ Note: If no grouping attributes are specified, the functions are applied to all the tuples in the relation, so
the resulting relation has a single tuple only.

OUTER JOIN Operations


✓ For a NATURAL JOIN operation R * S, only tuples from R that have matching tuples in S—andvice
versa—appear in the result. Hence, tuples without a matching (or related) tuple are eliminated from the
JOIN result.
✓ Tuples with NULL values in the join attributes are also eliminated. This type of join, where tuples with
no match are eliminated, is known as an inner join.
✓ This amounts to the loss of information if the user wants the result of the JOIN to include all thetuples in
one or more of the component relations.
✓ A set of operations, called outer joins, were developed for the case where the user wants to keep all the
tuples in R, or all those in S, or all those in both relations in the result of the JOIN, regardless of whether
or not they have matching tuples in the other relation.
✓ This satisfies the need of queries in which tuples from two tables are to be combined by matching
corresponding rows, but without losing any tuples for lack of matching values.
✓ For example, suppose that we want a list of all employee names as well as the name of the departments
they manage if they happen to manage a department; if they do not manage one, we can indicate it with a

NULL value. We can apply an operation LEFT OUTER JOIN, denoted by , to retrieve the result
as follows:
TEMP ← (EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)
RESULT ← πFname, Minit, Lname, Dname(TEMP)
✓ The LEFT OUTER JOIN operation keeps every tuple in the first, or left, relation R in R S; if no matching
tuple is found in S, then the attributes of S in the join result are filled with NULL values.

✓ A similar operation, RIGHT OUTER JOIN, denoted by , keeps every tuple in the second, or right,

Dept. of CSE(AIML), GAT 16


22CML42

relation S in the result of R S.


✓ A third operation, FULL OUTER JOIN, denoted by , keeps all tuples in both the left and
the right relations when no matching tuples are found, filling them with NULL values as needed.

Right outer join Left outer join Full outer join

Examples of Queries in Relational Algebra

Query 1. Retrieve the name and address of all employees who work for the

Query 2. controlling department


address, and birth date.

Query 3. Find the names of employees who work on all the projects controlled by
department number 5.
Dept. of CSE(AIML), GAT 17
22CML42

Query 4. Make a list of project numbers for projects that involve an employee whose last name is
that controls the project.

Query 5. List the names of all employees with two or more dependents.

Query 6. Retrieve the names of employees who have no dependents.

Query 7. List the names of managers who have at least one dependent.

Dept. of CSE(AIML), GAT 18


22CML42

Mapping Conceptual Design into a Logical Design

Relational Database Design using ER-to-Relational mapping.

Step 1: Mapping of Regular Entity Types


• For each regular entity type, create a relation R that includes all the simple attributes of E Include only
the simple component attributes of a composite attribute
• Choose one of the key attributes of E as the primary key for R
• If the chosen key of E is a composite, then the set of simple attributes that form it will together form
the primary key of R.
In our example-COMPANY database, we create the relations EMPLOYEE, DEPARTMENT, and
PROJECT
• we choose Ssn, Dnumber, and Pnumber as primary keys for the relations EMPLOYEE,
DEPARTMENT, and PROJECT, respectively

Step 2: Mapping of Weak Entity Types


• For each weak entity type, create a relation R and include all simple attributes of the entity type as
attributes of R
• Include primary key attribute of owner as foreign key attributes of R
• In our example, we create the relation DEPENDENT in this step to correspond to the weak entity type
DEPENDENT
• We include the primary key Ssn of the EMPLOYEE relation which corresponds to the owner entity
type as a foreign key attribute of DEPENDENT; we rename it as Essn
• The primary key of the DEPENDENT relation is the combination {EMPL-SSN,Name}, because
Dependent’s name is the partial key of DEPENDENT

Dept. of CSE(AIML), GAT 19


22CML42

Step 3: Mapping of Binary 1:1 Relationship Types


• For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that
correspond to the entity types participating in R
• There are three possible approaches:
- foreign key approach
- merged relationship approach
- crossreference or relationship relation approach
The foreign key approach
• Choose one of the relations S, say and include as a foreign key in S the primary key of T. It is
better to choose an entity type with total participation in R in the role of S
• Include all the simple attributes (or simple components of composite attributes) of the 1:1 relationship
type R as attributes of S.

Merged relation approach:


• merge the two entity types and the relationship into a single relation
Cross-reference or relationship relation approach:
• set up a third relation R for the purpose of cross-referencing the primary keys of the two relations S and T
representing the entity types.
• required for binary M:N relationships

Dept. of CSE(AIML), GAT 20


22CML42

Step 4: Mapping of Binary 1:N Relationship Types


• For each regular binary 1:N relationship type R, identify the relation S that represents the participating entity
type at the N-side of the relationship type.
• Include as foreign key in S the primary key of the relation T that represents the other entity type participating
in R
• For WORKS_FOR , (Employee (N) works for Department (1)) , we include the primary key Dnumber of the
DEPARTMENT relation as foreign key in the EMPLOYEE relation and call it Dno.

Step 5: Mapping of Binary M:N Relationship Types


• For each binary M:N relationship type Create a new relation S
• Include primary key of participating entity types as foreign key attributes in S Include any simple attributes of
M:N relationship type
• The propagate (CASCADE) option for the referential triggered action should be specified on the foreign keys
in the relation corresponding to the relationship R,
• In our example,
• we map the M:N relationship type WORKS_ON by
• creating the relation WORKS_ON.
• We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keys in
WORKS_ON and rename them Pno and Essn, respectively
• The primary key of the WORKS_ON relation is the combination of the foreign key attributes {Essn, Pno}.

Step 6: Mapping of Multivalued Attributes


• For each multi-valued attribute A, create a new relation R.
• This relation will include an attribute corresponding to A, plus the primary key K of the parent relation
(entity type or relationship type) as a foreign key in R.
• The primary key of R is the combination of A and K.

Step 7: Mapping of N-ary Relationship Types


• For each n-ary relationship type R
• Create a new relation S to represent R
• Include primary keys of participating entity types as foreign keys Include any simple attributes as attributes
• The primary key of S is usually a combination of all the foreign keys that reference the relations representing
the participating entity types.

Dept. of CSE(AIML), GAT 21


22CML42

Dept. of CSE(AIML), GAT 22

You might also like