0% found this document useful (0 votes)
15 views99 pages

Mod 2

The document outlines the key concepts of the relational model, including relations (tables), tuples (rows), attributes (columns), and domains (valid values for attributes). It explains the structure of relation schemas, the significance of constraints in relational databases, and the different types of constraints such as inherent, schema-based, and application-based constraints. Additionally, it details various constraints like domain, key, NULL, entity integrity, and referential integrity constraints, along with examples of SQL implementations.

Uploaded by

harshitha s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views99 pages

Mod 2

The document outlines the key concepts of the relational model, including relations (tables), tuples (rows), attributes (columns), and domains (valid values for attributes). It explains the structure of relation schemas, the significance of constraints in relational databases, and the different types of constraints such as inherent, schema-based, and application-based constraints. Additionally, it details various constraints like domain, key, NULL, entity integrity, and referential integrity constraints, along with examples of SQL implementations.

Uploaded by

harshitha s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 99

Module 2

Relational Model
Key Concepts of the Relational Model
1.Relation (Table)
•Represents a collection of related data.
•Each table has a name (e.g., STUDENT).
2.Tuple (Row)
•Each row represents a single record or an entity (e.g., information about one student).
•Each tuple contains data values for multiple attributes.
3.Attribute (Column)
•Each column represents a specific data type or property (e.g., Name, Age, GPA).
4.Domain
•A set of valid values for an attribute.
•Example:
•Usa_phone_numbers: Only valid 10-digit phone numbers.
•Employee_ages: Only integers between 15 and 80.
•Grade_point_averages: Real numbers between 0 and 4.
A relation schema R, denoted by R(A1, A2, ..., An), is made up of a relation name R and a list of attributes,
A1, A2, ..., An.
Each attribute Ai is the name of a role played by some domain D in the relation schema R. D is called the
domain of Ai and is denoted by dom(Ai ).
A relation schema is used to describe a relation; R is called the name of this relation. The degree of a
relation is the number of attributes n of its relation schema.
Relation Schema
•Describes the structure of a relation (table).
•Includes:
•Relation name (e.g., STUDENT).
•List of attributes (e.g., Name, Ssn, Age, etc.).
•STUDENT(Name: string, Ssn: string, Age: integer, Gpa: real)

Each attribute has a domain:


•dom(Name) = Names
•dom(Age) = Employee_ages
•dom(Gpa) = Grade_point_averages
A relation (or relation state) r of the relation schema R(A1, A2, ..., An), also denoted by r(R), is a set of n-tuples
r = {t 1, t 2, ..., t m}. Each n-tuple t is an ordered list of n values t =<v1,v2,..vn>, where each value vi , 1 ≤ i ≤ n, is an
element of dom (Ai ) or is a special NULL value.
The i th value in tuple t, which corresponds to the attribute Ai , is referred to as t[A i ] or t.Ai (or t[i] if we use the
positional notation). The terms relation intension for the schema R and relation extension for a relation state r(R)
are also commonly used.
Relation State
•Represents the current data in a relation (table) at a given time.
•Denoted as r(R).
•Contains multiple tuples (rows).
•Each tuple has values for the corresponding attributes.

•Each row is a tuple (record).


• NULL values represent nknown or missing information.
•Each column header is an attribute.
• A relation (or relation state) r(R) is a mathematical relation of degree n on the
domains dom(A1), dom(A2), ..., dom(An), which is a subset of the Cartesian
product (denoted by ×) of the domains that define R:
1. Ordering of Tuples in a Relation
• Mathematical Perspective:
Mathematically, a relation is a set of tuples. Sets are unordered collections, meaning there is no
inherent sequence in which elements appear. Consequently, the order of tuples in a relation is
irrelevant.
• Physical Storage:
Although relational theory specifies that tuple order doesn’t matter, database systems store these
tuples physically in files or tables. Consequently, they appear to have an order (e.g., first, second,
or last record). This physical ordering is merely a result of how the data is stored; logically, the
relation itself remains unordered.
• Display in Tables:
When a relation is presented as a table, its rows appear in a specific order, but this order is not part
of the relation’s logical structure. Any tuple order is considered valid.
•Example:
Suppose we have a STUDENT relation with tuples ordered by Name in one view and by Age in
another view.
• Both views are considered identical since order is irrelevant at the logical level.
2. Ordering of Values within a Tuple
•First Definition (Standard Relational Model):
In this definition, a tuple is an ordered list of values. The order is important to ensure each
• value corresponds correctly to its associated attribute.
•Alternative Definition (Mapping Approach):
A more flexible definition treats a tuple as a mapping from attribute names to their respective values.
•Here, the order of attributes doesn’t matter because each attribute is explicitly paired with its value.
Example:
Consider two representations of a tuple:
1.(Name: "Alice", Age: 22, GPA: 3.8)
2.(Age: 22, Name: "Alice", GPA: 3.8)
Both are identical under the mapping approach since each value is labeled with its corresponding attribute.
3. Values and NULLs in Tuples
• Atomic Values:
Each value in a tuple is atomic, meaning it cannot be further divided. This
constraint aligns with the First Normal Form (1NF) in database design,
ensuring each field holds a single value rather than multiple or composite
values.
• NULL Values:
NULL values are used to indicate:
• Unknown values — e.g., a missing phone number.
• Unavailable values — e.g., an existing phone number that hasn’t been recorded.
• Non-applicable values — e.g., an office phone field for students who do not have an
office.
• Example:
A Visa_status attribute may have NULL values for domestic students since
this field is only applicable to foreign students.
4. Interpretation (Meaning) of a Relation
A relation schema defines the structure of data and its intended meaning:
•As a Declaration/Assertion:
Each tuple represents a fact. For example, in a STUDENT relation, a tuple asserting "John Doe, Age 20"
•declares that a student named John Doe is 20 years old.
•As a Predicate:
A relation schema can be interpreted as a predicate — a condition that each tuple satisfies.
•For instance, the relation STUDENT(Name, Ssn, Age) asserts that every entry is a valid student with corresponding
details.
Closed World Assumption:
This assumption states that the only true facts are those explicitly present in the relation.
Any fact not present is assumed to be false.

5. Entity and Relationship Representation


•Entity Representation:
Relations like STUDENT or EMPLOYEE describe individual entities with their attributes.
•Relationship Representation:
Relations like MAJORS(Student_ssn, Department_code) describe associations between entities (e.g., a
student’s major department).
Both entities and relationships are represented uniformly as relations in the relational model, which may sometimes
cause ambiguity in distinguishing the two.
1. Relation Schema (Structure of the Relation)
•A relation schema defines the structure of a relation, specifying the relation's name and its attributes.
•It is represented as:
R(A1,A2,...,An)R(A_1, A_2, ..., A_n)R(A1​,A2​,...,An​)
Where:
•R → The relation schema name.
•A1, A2, ..., An → Attributes (columns) of the relation.
•The degree (or arity) of the relation is the number of attributes (i.e., n).

Example:
If we have a STUDENT relation schema:
STUDENT(Name,Ssn,Homephone,Address,Officephone,Age,Gpa)STUDENT(Name, Ssn, Home_phone,
Address, Office_phone, Age, Gpa)STUDENT(Name,Ssn,Homep​hone,Address,Officep​hone,Age,Gpa) This
schema has 7 attributes and a degree of 7.
2. Relation Names and States
•Uppercase Letters like R, S, and Q → Represent relation schemas.
•Lowercase Letters like r, s, and q → Represent relation states
• (the actual data inside the relation at a particular moment).
•A relation state refers to the current set of tuples in the relation.
Example:
•STUDENT refers to the schema definition.
•r(STUDENT) refers to the current state of the STUDENT relation (i.e., the current rows/tuples stored in the
table).
3. Tuple Representation
•Tuples are represented by lowercase letters such as t, u, and v.
•An n-tuple (a tuple with n values) is denoted as:
t=<v1,v2,...,vn>t = <v_1, v_2, ..., v_n>t=<v1​,v2​,...,vn​>
Where:
•t is the tuple.
•Each v_i represents the value corresponding to attribute A_i.
Example: Consider the following tuple from the STUDENT relation:
t=<′BarbaraBenson′,′533−69−1238′,′(817)839−8461′,′7384FontanaLane′,
NULL,19,3.25>t = <'Barbara Benson', '533-69-1238', '(817)839-8461', '7384 Fontana Lane',
NULL, 19, 3.25>t=<′BarbaraBenson′,′533−69−1238′,′(817)839−8461′,′7384FontanaLane′,
NULL,19,3.25>
In this example:
•t[Name] = 'Barbara Benson' (the value for the Name attribute).
•t[Ssn] = '533-69-1238'.
•t[Office_phone] = NULL.
4. Dot Notation for Attribute Qualification
•The dot notation helps distinguish attributes that may have the same name in different relations.
•It follows this format:
R.AR.AR.A
Where:
•R is the relation name.
•A is the attribute name.
This is essential in cases where two relations share an attribute name.
Example:
•STUDENT.Name refers to the Name attribute from the STUDENT relation.
•EMPLOYEE.Name refers to the Name attribute from the EMPLOYEE relation.
Since both relations have an attribute called Name, the dot notation prevents ambiguity.
5. Accessing Tuple Components
The notation for accessing individual or grouped values within a tuple is versatile:
✅ Accessing a Single Value:
•t[Ai] or t.Ai → Refers to the value of attribute Ai in tuple t.
•Occasionally, t[i] can also be used (especially in programming contexts) where i is the position of the attribute.
✅ Accessing Multiple Values (Subtuple):
•t[Au, Aw, ..., Az] or t.(Au, Aw, ..., Az) → Refers to a subtuple that includes only the specified attributes.
Example:
For the tuple:
t=<′BarbaraBenson′,′533−69−1238′,′(817)839−8461′,′7384FontanaLane′,
NULL,19,3.25>t = <'Barbara Benson', '533-69-1238', '(817)839-8461', '7384 Fontana Lane', NULL,
19, 3.25>t=<′BarbaraBenson′,′533−69−1238′,′(817)839−8461′,′7384FontanaLane′,NULL,19,3.25>
•t[Name] or t.Name = 'Barbara Benson'
•t[Ssn, Gpa, Age] = <'533-69-1238', 3.25, 19>
This makes it easy to extract specific data from complex tuples.
Relational model constraints
• In a relational database, constraints are essential for ensuring the accuracy,
reliability, and integrity of the data. These constraints enforce rules that reflect
the conditions of the real-world scenario being modeled. Let's explore the
three main categories of constraints and their key characteristics in detail.
1. Types of Constraints in Relational Databases
• Constraints can be classified into three categories:
(a) Inherent Constraints (Implicit Constraints)
• These constraints are automatically part of the relational model itself.
• They do not need to be explicitly defined by the database designer.
• Example:
• A relation must not contain duplicate tuples (since a relation is defined as a set of
tuples).
• Each attribute in a relation schema must have a unique name.
(b) Schema-Based Constraints (Explicit Constraints)
• These constraints are explicitly defined in the data definition language (DDL).
• They are enforced directly within the database schema and are automatically checked by the
database system.
• Examples of schema-based constraints include:
• Domain Constraints
• Key Constraints
• Constraints on NULL Values
• Entity Integrity Constraints
• Referential Integrity Constraints
(c) Application-Based Constraints (Semantic Constraints/Business Rules)
• These constraints are more complex and are enforced through application logic rather than the
database schema.
• They describe specific business rules that cannot be directly implemented using DDL.
• Example:
• "A student must be at least 18 years old to enroll in a university."
• Such constraints are typically enforced through programming logic in the application layer.
2. Schema-Based Constraints in Detail
(a) Domain Constraints
•Domain constraints specifythat the values for an attribute must belong to a predefined domain.
•A domain is a pool of valid values for an attribute. Each attribute is assigned a data type
that defines its domain.
Examples of Data Types for Domains:
•Numeric types: INTEGER, FLOAT, DOUBLE
•Character types: CHAR(n), VARCHAR(n)
•Boolean types: TRUE/FALSE
•Date and Time types: DATE, TIME, TIMESTAMP
•Custom domains: An attribute may use enumerated types or value ranges.

CREATE TABLE EMPLOYEE (


Emp_ID INTEGER,
Name VARCHAR(50),
Salary DECIMAL(10, 2) CHECK (Salary > 0),
Age INTEGER CHECK (Age BETWEEN 18 AND 65)
);
(b) Key Constraints
•A key constraint enforces uniqueness in a relation to ensure no two tuples have identical values for
certain attributes.
•In the relational model, duplicate tuples are not allowed — this is an inherent constraint.
•A key is a set of one or more attributes that uniquely identifies each tuple in a relation.
Types of Keys:
•Superkey: A set of one or more attributes that uniquely identifies each tuple in the relation.
Example: In a STUDENT relation, {Ssn} or {Ssn, Name, Age} are both superkeys. However, superkeys may contain extra
attributes that are not essential for uniqueness.
•Candidate Key: A minimal superkey with no redundant attributes.
Example: {Ssn} is a candidate key, while {Ssn, Name, Age} is a superkey but not a candidate key because Name and Age
are redundant.
•Primary Key: The chosen candidate key used to uniquely identify records. It is often selected based
• on simplicity or performance considerations.

CREATE TABLE STUDENT (


Ssn CHAR(11) PRIMARY KEY,
Name VARCHAR(30),
Age INTEGER,
Gpa DECIMAL(3, 2)
);

•The Ssn attribute is marked as the primary key since it uniquely identifies each student.
•Unique Key: Other candidate keys that are not chosen as the primary key are designated as unique keys .
• Example of a Unique Key in SQL:
CREATE TABLE CAR (
License_number VARCHAR(15) PRIMARY KEY,
Engine_serial_number VARCHAR(20) UNIQUE
);
In this example:
•License_number is the primary key.
•Engine_serial_number is a unique key.
(c) NULL Constraints
•The NULL constraint restricts whether an attribute can store NULL values or not.
•If an attribute is marked as NOT NULL, every tuple must have a non-null value for that attribute.
Example in SQL:
CREATE TABLE EMPLOYEE (
Emp_ID INTEGER PRIMARY KEY,
Name VARCHAR(30) NOT NULL,
Department VARCHAR(20) NULL
);

•The Department attribute may contain NULL.

•The Name attribute must have a value.


(d) Entity Integrity Constraint
•The entity integrity constraint ensures that the primary key of a relation cannot be
NULL.
•Since primary keys are used to uniquely identify tuples, they must always contain a valid
value.
Example in SQL:
CREATE TABLE DEPARTMENT (
Dept_ID INTEGER PRIMARY KEY,
Dept_Name VARCHAR(50) NOT NULL
);
•Dept_ID cannot have NULL values because it is the primary key.

(e) Referential Integrity Constraint


• A referential integrity constraint enforces a relationship between two
relations by linking a foreign key to a primary key in another relation.
• This constraint ensures that the foreign key's value must either:
• Match a value in the referenced primary key.
• Or be NULL.
• CREATE TABLE STUDENT (
• Ssn CHAR(11) PRIMARY KEY,
• Name VARCHAR(30)
• );

• CREATE TABLE ENROLLMENT (


• Course_ID VARCHAR(10),
• Student_Ssn CHAR(11),
• FOREIGN KEY (Student_Ssn) REFERENCES STUDENT(Ssn)
• );

The Student_Ssn in ENROLLMENT must match a valid Ssn from the STUDENT table or be NULL.
3. Application-Based (Semantic) Constraints
• These constraints reflect business logic or complex conditions that
cannot be directly implemented using DDL.
• They are enforced using triggers, stored procedures, or application
code.
• Example:
• “An employee's salary cannot exceed their manager’s salary.”
• “A customer can place no more than five orders in a single day.”
• Such rules are best implemented using programming logic within the
application.
4. Data Dependencies
• Data dependencies define relationships between attributes and help ensure database
consistency during the normalization process.
• The two main types are:
• Functional Dependencies — Relationships between attributes where one attribute’s value
determines another.
• Multivalued Dependencies — Occur when multiple independent values are related to the
same primary key.
Relational Model Operations

• Relational model operations are divided into two categories:


• Retrievals (Getting data)
• Updates (Changing data)
• Use Relational Algebra and Relational Calculus
• You apply algebraic operations on existing tables to create new
results (relations).
• Relational Calculus is declarative
• Example: You write a query to find employees in department 4 →
DBMS returns the result as a new table.
Update Operations
These change data in the database. There are 3 types:
a. Insert
•Adds a new record (tuple) into a table.
•Can violate:
•Domain constraint: wrong data type or out-of-range value.
•Key constraint: duplicate primary key.
•Entity integrity: primary key is NULL.
•Referential integrity: foreign key doesn't match existing value.
Examples:
•❌ Insert with NULL Ssn → Violates Entity Integrity.
•❌ Insert with duplicate Ssn → Violates Key Constraint.
•❌ Insert with Dno = 7 (nonexistent department) → Violates Referential Integrity.
•✅ Valid insert with all constraints satisfied → Accepted.
b. Delete
•Removes one or more records from a table.
•Can only violate Referential Integrity if other tables reference the deleted record.
Options if violation occurs:
1.Restrict: Reject deletion.
2.Cascade: Delete all dependent records too.
3.Set NULL/Default: Set foreign keys to NULL or default value.
Examples:
•✅ Delete from WORKS_ON → OK.
•❌ Delete employee referenced in WORKS_ON → Violates referential integrity.
•❌ Delete manager referenced in multiple tables → May cause multiple violations.
• c. Update (Modify)
• Changes values in existing records.
• Can violate:
• Key constraints
• Referential integrity
• Domain constraints
• Examples:
• ✅ Update salary → OK.
• ✅ Update department number to existing value → OK.
• ❌ Update department number to non-existing value → Violates Referential Integrity.
• ❌ Change primary key to existing one → Violates Primary Key & Referential Integrity.
• Important Notes:
• Changing primary key = similar to Delete + Insert
• If foreign keys are updated, new value must point to an existing record (or be NULL).
• 4. Transaction Concept
• A transaction is a unit of work that includes retrievals or updates.
• It should leave the database in a consistent state.
• Transactions must satisfy all constraints (key, integrity, etc.).
• A bank transaction example:
• Read balance
• Check if withdrawal is possible
• Update balance
• OLTP systems (like banking or shopping apps) run hundreds of
transactions per second.
Relational algebra and Relational
Calculus
• Unary Relational Operations: SELECT and PROJECT
• What is the SELECT Operation?
• The SELECT operation is used to filter rows (tuples) from a table (relation)
based on a condition.
• It keeps only the tuples that satisfy a certain condition.
• σ<condition>(RelationName)
• σ (sigma) is the symbol for SELECT.
• <condition> is a Boolean condition using attributes from the relation.
• The output is a new relation with the same columns as the original but
fewer rows.
Select employees from department 4:

σ Dno = 4 (EMPLOYEE)
• Select employees with salary > 30000:
• σ Salary > 30000 (EMPLOYEE)

Use AND condition


σ Dno = 4 AND Salary > 25000 (EMPLOYEE)
• Use OR in condition:
• σ (Dno = 4 AND Salary > 25000) OR (Dno = 5 AND Salary > 30000)
(EMPLOYEE)

•It’s a unary operation – works on one table at a time.


•Can use operators like: =, >, <, >=, <=, ≠
•You can use AND, OR, NOT to build complex conditions.
•SELECT only filters rows, it doesn't change the columns.
•You can combine multiple SELECTs into one with AND.
• SELECT in SQL
• The relational algebra:
• σ Dno = 4 AND Salary > 25000 (EMPLOYEE)
• Is equivalent to the SQL:
• SELECT *
• FROM EMPLOYEE
• WHERE Dno = 4 AND Salary > 25000;
• ■ (cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE;
otherwise, it is FALSE.
• ■ (cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are
TRUE; otherwise, it is FALSE.
• ■ (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.
• The SELECT operator is unary; that is, it is applied to a single relation.
Moreover, the selection operation is applied to each tuple individually;
hence, selection conditions cannot involve more than one tuple.
• The degree of the relation resulting from a SELECT operation—its
number of attributes—is the same as the degree of R. The number of
tuples in the resulting relation is always less than or equal to the
number of tuples in R.
• The fraction of tuples selected by a selection condition is referred to as
the selectivity of the condition.
• Notice that the SELECT operation is commutative;
• that is,

• Hence, a sequence of SELECTs can be applied in any order.


• In addition, we can always combine a cascade (or sequence) of
SELECT operations into a single SELECT operation with a conjunctive
(AND) condition;
• that is,
PROJECT Operation (π)
• The PROJECT operation selects specific columns (attributes) from a
table.
• It removes duplicate rows automatically (because relations are sets).
• π<attribute1, attribute2, ...>(RelationName)
Relational Algebra
π Sex, Salary (EMPLOYEE)
Sql Equivalent
SELECT DISTINCT Sex, Salary
FROM EMPLOYEE;
•Only columns are selected.
•Removes duplicates automatically.
•Cannot use conditions (that's for SELECT).
•The number of columns reduces, but the number of rows can
reduce if duplicates are removed.
• If the attribute list includes only nonkey attributes of R, duplicate tuples are
likely to occur. The PROJECT operation removes any duplicate tuples, so the
result of the PROJECT operation is a set of distinct tuples, and hence a valid
relation. This is known as duplicate elimination.
• For example, consider the following PROJECT operation: πSex,
Salary(EMPLOYEE)
• The number of tuples in a relation resulting from a PROJECT operation is
always less than or equal to the number of tuples in R. If the projection list is a
superkey of R—that is, it includes some key of R—the resulting relation has the
same number of tuples as R. Moreover,

• as long as <list2> contains the attributes in <list1> ; otherwise, the left-hand


side is an incorrect expression. It is also noteworthy that commutativity does
not hold on PROJECT.
Sequences of Operations and the RENAME Operation
• we can write the operations as a single relational algebra expression by nesting
the operations, or we can apply one operation at a time and create intermediate
result relations. we must give names to the relations that hold the intermediate
results. For example, to retrieve the first name, last name, and salary of all
employees who work in department number 5, we must apply a SELECT and a
PROJECT operation. We can write a single relational algebra expression, also
known as an in-line expression, as follows:

• Above shows the result of this in-line relational algebra expression.


Alternatively, we can explicitly show the sequence of operations,
giving a name to each intermediate relation, and using the
assignment operation, denoted by ← (left arrow), as follows:
• It is sometimes simpler to break down a complex sequence of
operations by specifying intermediate result relations than to write a
single relational algebra expression. We can also use this technique to
rename the attributes in the intermediate and result relations. This
can be useful in connection with more complex operations such as
UNION and JOIN, as we shall see. To rename the attributes in a
relation, we simply list the new attribute names in parentheses, as in
the following example:
• We can also define a formal RENAME operation—which can rename
either the relation name or the attribute names, or both—as a unary
operator. The general RENAME operation when applied to a relation R
of degree n is denoted by any of the following three forms:

where the symbol ρ (rho) is used to denote the RENAME operator, S is the new relation name, and B1, B2,
… , Bn are the new attribute names. The first expression renames both the relation and its attributes, the
second renames the relation only, and the third renames the attributes only. If the attributes of R are (A1,
A2, … , An) in that order, then each Ai is renamed as Bi.

SELECT E.Fname AS First_name, E.Lname AS Last_name, E.Salary AS Salary


FROM EMPLOYEE AS E
WHERE E.Dno=5,
• 🔹 UNION ( ∪ )
• Combines rows from two relations.
• Both must have the same number and type of attributes.
• Removes duplicates.
• 🔹 INTERSECTION ( ∩ )
• Returns common rows between two relations.
• 🔹 SET DIFFERENCE ( − )
• Returns rows in one relation that are not in another.
• 🔹 CARTESIAN PRODUCT ( × )
• Combines all rows of one table with all rows of another.
• Basis for JOIN, but usually large and inefficient.
The UNION, INTERSECTION, and MINUS Operations

• to retrieve the Social Security numbers of all employees who either


work in department 5 or directly supervise an employee who works in
department 5, we can use the UNION operation as follows:
DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT1 ← πSsn(DEP5_EMPS)
RESULT2(Ssn) ← πSuper_ssn(DEP5_EMPS)
RESULT ← RESULT1 ∪ RESULT2
The relation RESULT1 has the Ssn of all employees who work in
department 5, whereas RESULT2 has the Ssn of all employees who
directly supervise an employee who works in department 5. The UNION
operation produces the tuples that are in either RESULT1 or RESULT2 or
both
• We can define the three operations UNION, INTERSECTION, and SET DIFFERENCE
on two union-compatible relations R and S as follows: ■ UNION: The result of
this operation, denoted by R ∪ S, is a relation that includes all tuples that are
either in R or in S or in both R and S. Duplicate tuples are eliminated.
• INTERSECTION: The result of this operation, denoted by R ∩ S, is a relation that
includes all tuples that are in both R and S.
• SET DIFFERENCE (or MINUS): The result of this operation, denoted by R – S, is a
relation that includes all tuples that are in R but not in S.
• UNION, INTERSECTION, and SET DIFFERENCE (also called MINUS or
EXCEPT). These are binary operations; that is, each is applied to two
sets (of tuples).
• When these operations are adapted to relational databases, the two
relations on which any of these three operations are applied must
have the same type of tuples; this condition has been called union
compatibility or type compatibility.
• Two relations R(A1, A2, … , An) and S(B1, B2, … , Bn) are said to be
union compatible (or type compatible) if they have the same degree n
and if dom(Ai) = dom(Bi) for 1 ≤ i ≤ n.
• This means that the two relations have the same number of attributes
and each corresponding pair of attributes has the same domain.
• Notice that both UNION and INTERSECTION are commutative
operations; that is, R ∪ S = S ∪ R and R ∩ S = S ∩ R
• Both UNION and INTERSECTION can be treated as n-ary operations
applicable to any number of relations because both are also associative
operations; that is,
• R ∪ (S ∪ T ) = (R ∪ S) ∪ T and (R ∩ S) ∩ T = R ∩ (S ∩ T)
• The MINUS operation is not commutative; R − S ≠ S − R
• Note that INTERSECTION can be expressed in terms of union and set
difference as follows: R ∩ S = ((R ∪ S) − (R − S)) − (S − R)
• In SQL, there are three operations—UNION, INTERSECT, and EXCEPT—
that correspond to the set operations
• there are multiset operations (UNION ALL, INTERSECT ALL, and EXCEPT
ALL) that do not eliminate duplicates
The CARTESIAN PRODUCT (CROSS PRODUCT) Operation

• The CARTESIAN PRODUCT operation—also known as CROSS PRODUCT or


CROSS JOIN—which is denoted by ×. This is also a binary set operation, but
the relations on which it is applied do not have to be union compatible.
• In its binary form, this set operation produces a new element by combining
every member (tuple) from one relation (set) with every member (tuple)
from the other relation (set).
• In general, the result of R(A1, A2, … , An) × S(B1, B2, … , Bm) is a relation Q
with degree n + m attributes Q(A1, A2, … , An, B1, B2, … , Bm), in that order.
• The resulting relation Q has one tuple for each combination of tuples—one
from R and one from S.
• Hence, if R has nR tuples (denoted as |R| = nR), and S has nS tuples, then R ×
S will have nR * nS tuples.
• The n-ary CARTESIAN PRODUCT operation is an extension of the
above concept, which produces new tuples by concatenating all
possible combinations of tuples from n underlying relations.
• For example, suppose that we want to retrieve a list of names of each
female employee’s dependents. We can do this as follows:
FEMALE_EMPS ← σSex=‘F’(EMPLOYEE)
EMPNAMES ← πFname, Lname, Ssn(FEMALE_EMPS)
EMP_DEPENDENTS ← EMPNAMES × DEPENDENT
ACTUAL_DEPENDENTS ← σSsn=Essn(EMP_DEPENDENTS)
RESULT ← πFname, Lname, Dependent_name(ACTUAL_DEPENDENTS)
• The EMP_DEPENDENTS relation is the result of applying the
CARTESIAN PRODUCT operation to EMPNAMES from Figure 8.5 with
DEPENDENT from Figure 5.6. In EMP_DEPENDENTS, every tuple from
EMPNAMES is combined with every tuple from DEPENDENT, giving a
result that is not very meaningful (every dependent is combined with
every female employee). We want to combine a female employee
tuple only with her particular dependents—namely, the DEPENDENT
tuples whose Essn value match the Ssn value of the EMPLOYEE tuple.
The ACTUAL_DEPENDENTS relation accomplishes this.
• The CARTESIAN PRODUCT creates tuples with the combined attributes
of two relations. We can SELECT related tuples only from the two
relations by specifying an appropriate selection condition after the
Cartesian product
• Because this sequence of CARTESIAN PRODUCT followed by SELECT is
quite commonly used to combine related tuples from two relations, a
special operation, called JOIN, was created to specify this sequence as
a single operation.
• In SQL, CARTESIAN PRODUCT can be realized by using the CROSS JOIN
option in joined tables.
• Alternatively, if there are two tables in the FROM clause and there is
no corresponding join condition in the WHERE clause of the SQL
query, the result will also be the CARTESIAN PRODUCT of the two
tables
JOIN Operation (⨝)

• JOIN combines two tables based on a related attribute (like a foreign key).
• It merges matching rows.
• R ⨝<condition> S
•R and S are two relations (tables).
•The condition is usually something like: R.attr = S.attr
• The JOIN operation, denoted by , is used to combine related tuples from
two relations into single “longer” tuples.
• This operation is very important for any relational database with more
than a single relation because it allows us to process relationships among
relations.
Join EMPLOYEE and DEPARTMENT where Dno = Dnumber

EMPLOYEE ⨝ EMPLOYEE.Dno = DEPARTMENT.Dnumber

•JOIN combines rows from two tables with matching values.


•Common JOIN conditions are equality conditions.
•Result has all columns from both tables.
•Can also be written using SELECT + CARTESIAN PRODUCT, like:

σ EMPLOYEE.Dno = DEPARTMENT.Dnumber (EMPLOYEE × DEPARTMENT)


• To illustrate JOIN, suppose that we want to retrieve the name of the
manager of each department. To get the manager’s name, we need to
combine each department tuple with the employee tuple whose Ssn
value matches the Mgr_ssn value in the department tuple. We do this
by using the JOIN operation and then projecting the result over the
necessary attributes, as follows:

• Mgr_ssn is a foreign key of the DEPARTMENT relation that references


Ssn, the primary key of the EMPLOYEE relation. This referential
integrity constraint plays a role in having matching tuples in the
referenced relation EMPLOYEE.
• The JOIN operation can be specified as a CARTESIAN PRODUCT
operation followed by a SELECT operation. However, JOIN is very
important because it is used frequently when specifying database
queries.
• In JOIN, only combinations of tuples satisfying the join condition appear in the
result, whereas in the CARTESIAN PRODUCT all combinations of tuples are
included in the result.
• A general join condition is of the form
• <condition> AND <condition> AND … AND <condition>
• where each <condition> is of the form Ai θ Bj, Ai is an attribute of R, Bj is an
attribute of S, Ai and Bj have the same domain, and θ (theta) is one of the
comparison operators {=, <, ≤, >, ≥, ≠}.
• A JOIN operation with such a general join condition is called a THETA JOIN.
Tuples whose join attributes are NULL or for which the join condition is FALSE
do not appear in the result.
• In that sense, the JOIN operation does not necessarily preserve all of the
information in the participating relations, because tuples that do not get
combined with matching ones in the other relation do not appear in the result.
Variations of JOIN: The EQUIJOIN and NATURAL JOIN

• The most common use of JOIN involves join conditions with equality
comparisons only. Such a JOIN, where the only comparison operator
used is =, is called an EQUIJOIN.
• EQUIJOIN: Equality-based Join .EQUIJOIN is a join where the join
condition is based only on equality comparisons between attributes
of the two relations.
Example:Suppose we have two relations:
• If we want to join PROJECT and DEPARTMENT where PROJECT.Dnum = DEPARTMENT.Dnumber, this is an EQUIJOIN:
• PROJECT ⋈ PROJECT.Dnum = DEPARTMENT.Dnumber DEPARTMENT

• Both Dnum and Dnumber are kept, even though they have the same value.
• 2. NATURAL JOIN: Auto-match on Common Attribute Names
• A NATURAL JOIN automatically finds common attribute names in both
relations and performs an EQUIJOIN, but removes duplicate columns.
• To use NATURAL JOIN:
• Let’s rename DEPARTMENT.Dnumber to Dnum so it matches
PROJECT.Dnum:
• DEPT ← ρ(Dname, Dnum, Mgr_ssn)(DEPARTMENT)
• PROJ_DEPT ← PROJECT * DEPT
• PROJ_DEPT ← PROJECT ⋈ DEPT
• Dnum is the join attribute, and since it is present in both relations, it appears only
once in the result.
• JOIN as CARTESIAN PRODUCT + SELECTION
• Any JOIN can be written as a Cartesian Product (×) followed by a Selection (σ).
• PROJECT ⋈ PROJECT.Dnum = DEPARTMENT.Dnumber DEPARTMENT
≡ σ(PROJECT.Dnum = DEPARTMENT.Dnumber)(PROJECT × DEPARTMENT)
• You first pair all combinations of tuples (like in a cross join), then filter only those
where Dnum = Dnumber.
• Multi-way Joins (n-way JOIN)
• You can chain multiple JOINs:
• ((PROJECT ⋈ Dnum=Dnumber DEPARTMENT) ⋈ Mgr_ssn=Ssn
EMPLOYEE)
• This combines:
• Each PROJECT with its controlling DEPARTMENT
• Then adds the EMPLOYEE who manages that department
• This gives a single record that includes:
• Project Info
• Department Info
• Manager Info
• NATURAL JOIN without Renaming
• If both relations already have attributes with the same name (e.g.,
Dnumber), NATURAL JOIN works without renaming.
• DEPT_LOCS ← DEPARTMENT ⋈ DEPT_LOCATIONS
• 6. JOIN Selectivity
• If:
• R has nR tuples
• S has nS tuples
• Then:
• Cartesian Product: nR × nS tuples
• JOIN: between 0 and nR × nS tuples
• Join Selectivity = Actual Join Result / (nR × nS)
• Low selectivity = fewer matches, high selectivity = more matches.
SQL Implementation of JOINs
• Method 1: Join in WHERE clause
• SELECT * FROM PROJECT, DEPARTMENT WHERE PROJECT.Dnum =
DEPARTMENT.Dnumber;
• Method 2: Explicit JOIN
• SELECT * FROM PROJECT JOIN DEPARTMENT ON PROJECT.Dnum =
DEPARTMENT.Dnumber;
• JOINs in Relational Algebra Core Set
• The basic relational algebra operations are:
• σ (Selection)
• π (Projection)
• × (Cartesian Product)
• ρ (Rename)
• ∪ (Union)
• – (Difference)
• JOIN is not strictly necessary—it can be built from:
• R ⋈condition S ≡ σ<condition>(R × S)
• NATURAL JOIN from Basic Ops:
• Rename to avoid name conflicts
• Cartesian Product
• Selection on matching attributes
• Projection to remove duplicates
• INNER JOIN
• Definition:
• An INNER JOIN returns only the rows where there is a match in both
tables based on the join condition.
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• INNER JOIN DEPARTMENT D ON E.DeptID = D.DeptID;
• 2. OUTER JOIN
• Definition:
• OUTER JOIN includes matching rows and the non-matching rows
from one or both tables, filling in NULLs for missing values.
• Types of Outer Joins:
• LEFT OUTER JOIN: All rows from the left table + matching rows from
the right.
• RIGHT OUTER JOIN: All rows from the right table + matching rows
from the left.
• FULL OUTER JOIN: All rows from both tables; unmatched rows are
filled with NULLs.
• LEFT OUTER JOIN Example:
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• LEFT OUTER JOIN DEPARTMENT D ON E.DeptID = D.DeptID;
• RIGHT OUTER JOIN Example:
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• RIGHT OUTER JOIN DEPARTMENT D ON E.DeptID = D.DeptID;

• Here, Marketing appears even though no employee is assigned to it.


• FULL OUTER JOIN Example:
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• FULL OUTER JOIN DEPARTMENT D ON E.DeptID = D.DeptID;

• Includes all records from both tables. Where there is no match, NULLs are filled in.
• 3. CROSS JOIN (Cartesian Product)
• Definition:
• A CROSS JOIN returns the Cartesian product of two tables—every
row from the first table is combined with every row from the
second.
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• CROSS JOIN DEPARTMENT D;

• Every row in EMPLOYEE is paired with every row in DEPARTMENT.


• The DIVISION Operation
• The DIVISION operation, denoted by ÷, is useful for a special kind of
query that sometimes occurs in database applications.
• An example is Retrieve the names of employees who work on all the
projects that ‘John Smith’ works on.
• To express this query using the DIVISION operation, proceed as
follows. In general, the DIVISION operation is applied to two relations
R(Z) ÷ S(X), where the
• attributes of S are a subset of the attributes of R; that is, X ⊆ Z. Let Y
be the set of
• attributes of R that are not attributes of S; that is, Y = Z – X (and hence
Z = X ∪ Y).
• First, retrieve the list of project numbers that ‘John Smith’ works on in the
intermediate relation
• SMITH_PNOS:
• SMITH ← σFname=‘John’ AND Lname=‘Smith’(EMPLOYEE)
• SMITH_PNOS ← πPno(WORKS_ON Essn=SsnSMITH)
• Next, create a relation that includes a tuple <Pno, Essn> whenever the
employee whose Ssn is Essn works on the project whose number is Pno in
the intermediate relation SSN_PNOS:
• SSN_PNOS ← πEssn, Pno(WORKS_ON)
• Finally, apply the DIVISION operation to the two relations, which gives the
desired employees’ Social Security numbers:
• SSNS(Ssn) ← SSN_PNOS ÷ SMITH_PNOS
• RESULT ← πFname, Lname(SSNS * EMPLOYEE)
• Generalized Projection
• In basic relational algebra, projection (π) allows selecting columns
(attributes).
Generalized projection goes further—it allows computing new values
using arithmetic expressions over attributes.
• EMPLOYEE (Ssn, Salary, Deduction, Years_service)
• We want a report showing:
• Net_salary = Salary – Deduction
• Bonus = 2000 × Years_service
• Tax = 0.25 × Salary
• Relational Algebra Expression
• REPORT ← ρ(Ssn, Net_salary, Bonus, Tax) (
π Ssn, Salary – Deduction, 2000 * Years_service, 0.25 * Salary
(EMPLOYEE) )
• π: Selects and computes values.
• ρ: Renames attributes in the result.
• The resulting relation REPORT looks like:
• Aggregate Functions and Grouping
• Aggregate functions operate on sets of values:
• COUNT, SUM, AVERAGE, MAX, MIN
• They summarize data across tuples, and can be applied to:
• the whole relation (no grouping), or
• groups of tuples (grouped by attributes).
• Suppose the EMPLOYEE relation also includes:
• EMPLOYEE (Ssn, Salary, Dno)
• Where Dno = department number.
• Let’s answer:
• “For each department, how many employees are there and what’s
the average salary?”
• Relational Algebra Expression
• ρR(Dno, No_of_employees, Average_sal)( Dno ℑ COUNT Ssn,
AVERAGE Salary (EMPLOYEE) )
• ℑ (script F): Grouping and aggregation.
• COUNT Ssn: Counts employees per group.
• AVERAGE Salary: Computes average salary per group.
• ρ: Renames attributes for clarity.
• No Grouping Example
• If you apply aggregation without grouping:
• ℑ COUNT Ssn, AVERAGE Salary (EMPLOYEE)
• The result is a single tuple:

• NULLs are ignored in aggregation.


• Duplicates are not removed.
• The result is still a relation, even if it contains just one value.

You might also like