Mod 2
Mod 2
Relational Model
Key Concepts of the Relational Model
1.Relation (Table)
•Represents a collection of related data.
•Each table has a name (e.g., STUDENT).
2.Tuple (Row)
•Each row represents a single record or an entity (e.g., information about one student).
•Each tuple contains data values for multiple attributes.
3.Attribute (Column)
•Each column represents a specific data type or property (e.g., Name, Age, GPA).
4.Domain
•A set of valid values for an attribute.
•Example:
•Usa_phone_numbers: Only valid 10-digit phone numbers.
•Employee_ages: Only integers between 15 and 80.
•Grade_point_averages: Real numbers between 0 and 4.
A relation schema R, denoted by R(A1, A2, ..., An), is made up of a relation name R and a list of attributes,
A1, A2, ..., An.
Each attribute Ai is the name of a role played by some domain D in the relation schema R. D is called the
domain of Ai and is denoted by dom(Ai ).
A relation schema is used to describe a relation; R is called the name of this relation. The degree of a
relation is the number of attributes n of its relation schema.
Relation Schema
•Describes the structure of a relation (table).
•Includes:
•Relation name (e.g., STUDENT).
•List of attributes (e.g., Name, Ssn, Age, etc.).
•STUDENT(Name: string, Ssn: string, Age: integer, Gpa: real)
Example:
If we have a STUDENT relation schema:
STUDENT(Name,Ssn,Homephone,Address,Officephone,Age,Gpa)STUDENT(Name, Ssn, Home_phone,
Address, Office_phone, Age, Gpa)STUDENT(Name,Ssn,Homephone,Address,Officephone,Age,Gpa) This
schema has 7 attributes and a degree of 7.
2. Relation Names and States
•Uppercase Letters like R, S, and Q → Represent relation schemas.
•Lowercase Letters like r, s, and q → Represent relation states
• (the actual data inside the relation at a particular moment).
•A relation state refers to the current set of tuples in the relation.
Example:
•STUDENT refers to the schema definition.
•r(STUDENT) refers to the current state of the STUDENT relation (i.e., the current rows/tuples stored in the
table).
3. Tuple Representation
•Tuples are represented by lowercase letters such as t, u, and v.
•An n-tuple (a tuple with n values) is denoted as:
t=<v1,v2,...,vn>t = <v_1, v_2, ..., v_n>t=<v1,v2,...,vn>
Where:
•t is the tuple.
•Each v_i represents the value corresponding to attribute A_i.
Example: Consider the following tuple from the STUDENT relation:
t=<′BarbaraBenson′,′533−69−1238′,′(817)839−8461′,′7384FontanaLane′,
NULL,19,3.25>t = <'Barbara Benson', '533-69-1238', '(817)839-8461', '7384 Fontana Lane',
NULL, 19, 3.25>t=<′BarbaraBenson′,′533−69−1238′,′(817)839−8461′,′7384FontanaLane′,
NULL,19,3.25>
In this example:
•t[Name] = 'Barbara Benson' (the value for the Name attribute).
•t[Ssn] = '533-69-1238'.
•t[Office_phone] = NULL.
4. Dot Notation for Attribute Qualification
•The dot notation helps distinguish attributes that may have the same name in different relations.
•It follows this format:
R.AR.AR.A
Where:
•R is the relation name.
•A is the attribute name.
This is essential in cases where two relations share an attribute name.
Example:
•STUDENT.Name refers to the Name attribute from the STUDENT relation.
•EMPLOYEE.Name refers to the Name attribute from the EMPLOYEE relation.
Since both relations have an attribute called Name, the dot notation prevents ambiguity.
5. Accessing Tuple Components
The notation for accessing individual or grouped values within a tuple is versatile:
✅ Accessing a Single Value:
•t[Ai] or t.Ai → Refers to the value of attribute Ai in tuple t.
•Occasionally, t[i] can also be used (especially in programming contexts) where i is the position of the attribute.
✅ Accessing Multiple Values (Subtuple):
•t[Au, Aw, ..., Az] or t.(Au, Aw, ..., Az) → Refers to a subtuple that includes only the specified attributes.
Example:
For the tuple:
t=<′BarbaraBenson′,′533−69−1238′,′(817)839−8461′,′7384FontanaLane′,
NULL,19,3.25>t = <'Barbara Benson', '533-69-1238', '(817)839-8461', '7384 Fontana Lane', NULL,
19, 3.25>t=<′BarbaraBenson′,′533−69−1238′,′(817)839−8461′,′7384FontanaLane′,NULL,19,3.25>
•t[Name] or t.Name = 'Barbara Benson'
•t[Ssn, Gpa, Age] = <'533-69-1238', 3.25, 19>
This makes it easy to extract specific data from complex tuples.
Relational model constraints
• In a relational database, constraints are essential for ensuring the accuracy,
reliability, and integrity of the data. These constraints enforce rules that reflect
the conditions of the real-world scenario being modeled. Let's explore the
three main categories of constraints and their key characteristics in detail.
1. Types of Constraints in Relational Databases
• Constraints can be classified into three categories:
(a) Inherent Constraints (Implicit Constraints)
• These constraints are automatically part of the relational model itself.
• They do not need to be explicitly defined by the database designer.
• Example:
• A relation must not contain duplicate tuples (since a relation is defined as a set of
tuples).
• Each attribute in a relation schema must have a unique name.
(b) Schema-Based Constraints (Explicit Constraints)
• These constraints are explicitly defined in the data definition language (DDL).
• They are enforced directly within the database schema and are automatically checked by the
database system.
• Examples of schema-based constraints include:
• Domain Constraints
• Key Constraints
• Constraints on NULL Values
• Entity Integrity Constraints
• Referential Integrity Constraints
(c) Application-Based Constraints (Semantic Constraints/Business Rules)
• These constraints are more complex and are enforced through application logic rather than the
database schema.
• They describe specific business rules that cannot be directly implemented using DDL.
• Example:
• "A student must be at least 18 years old to enroll in a university."
• Such constraints are typically enforced through programming logic in the application layer.
2. Schema-Based Constraints in Detail
(a) Domain Constraints
•Domain constraints specifythat the values for an attribute must belong to a predefined domain.
•A domain is a pool of valid values for an attribute. Each attribute is assigned a data type
that defines its domain.
Examples of Data Types for Domains:
•Numeric types: INTEGER, FLOAT, DOUBLE
•Character types: CHAR(n), VARCHAR(n)
•Boolean types: TRUE/FALSE
•Date and Time types: DATE, TIME, TIMESTAMP
•Custom domains: An attribute may use enumerated types or value ranges.
•The Ssn attribute is marked as the primary key since it uniquely identifies each student.
•Unique Key: Other candidate keys that are not chosen as the primary key are designated as unique keys .
• Example of a Unique Key in SQL:
CREATE TABLE CAR (
License_number VARCHAR(15) PRIMARY KEY,
Engine_serial_number VARCHAR(20) UNIQUE
);
In this example:
•License_number is the primary key.
•Engine_serial_number is a unique key.
(c) NULL Constraints
•The NULL constraint restricts whether an attribute can store NULL values or not.
•If an attribute is marked as NOT NULL, every tuple must have a non-null value for that attribute.
Example in SQL:
CREATE TABLE EMPLOYEE (
Emp_ID INTEGER PRIMARY KEY,
Name VARCHAR(30) NOT NULL,
Department VARCHAR(20) NULL
);
The Student_Ssn in ENROLLMENT must match a valid Ssn from the STUDENT table or be NULL.
3. Application-Based (Semantic) Constraints
• These constraints reflect business logic or complex conditions that
cannot be directly implemented using DDL.
• They are enforced using triggers, stored procedures, or application
code.
• Example:
• “An employee's salary cannot exceed their manager’s salary.”
• “A customer can place no more than five orders in a single day.”
• Such rules are best implemented using programming logic within the
application.
4. Data Dependencies
• Data dependencies define relationships between attributes and help ensure database
consistency during the normalization process.
• The two main types are:
• Functional Dependencies — Relationships between attributes where one attribute’s value
determines another.
• Multivalued Dependencies — Occur when multiple independent values are related to the
same primary key.
Relational Model Operations
σ Dno = 4 (EMPLOYEE)
• Select employees with salary > 30000:
• σ Salary > 30000 (EMPLOYEE)
where the symbol ρ (rho) is used to denote the RENAME operator, S is the new relation name, and B1, B2,
… , Bn are the new attribute names. The first expression renames both the relation and its attributes, the
second renames the relation only, and the third renames the attributes only. If the attributes of R are (A1,
A2, … , An) in that order, then each Ai is renamed as Bi.
• JOIN combines two tables based on a related attribute (like a foreign key).
• It merges matching rows.
• R ⨝<condition> S
•R and S are two relations (tables).
•The condition is usually something like: R.attr = S.attr
• The JOIN operation, denoted by , is used to combine related tuples from
two relations into single “longer” tuples.
• This operation is very important for any relational database with more
than a single relation because it allows us to process relationships among
relations.
Join EMPLOYEE and DEPARTMENT where Dno = Dnumber
• The most common use of JOIN involves join conditions with equality
comparisons only. Such a JOIN, where the only comparison operator
used is =, is called an EQUIJOIN.
• EQUIJOIN: Equality-based Join .EQUIJOIN is a join where the join
condition is based only on equality comparisons between attributes
of the two relations.
Example:Suppose we have two relations:
• If we want to join PROJECT and DEPARTMENT where PROJECT.Dnum = DEPARTMENT.Dnumber, this is an EQUIJOIN:
• PROJECT ⋈ PROJECT.Dnum = DEPARTMENT.Dnumber DEPARTMENT
• Both Dnum and Dnumber are kept, even though they have the same value.
• 2. NATURAL JOIN: Auto-match on Common Attribute Names
• A NATURAL JOIN automatically finds common attribute names in both
relations and performs an EQUIJOIN, but removes duplicate columns.
• To use NATURAL JOIN:
• Let’s rename DEPARTMENT.Dnumber to Dnum so it matches
PROJECT.Dnum:
• DEPT ← ρ(Dname, Dnum, Mgr_ssn)(DEPARTMENT)
• PROJ_DEPT ← PROJECT * DEPT
• PROJ_DEPT ← PROJECT ⋈ DEPT
• Dnum is the join attribute, and since it is present in both relations, it appears only
once in the result.
• JOIN as CARTESIAN PRODUCT + SELECTION
• Any JOIN can be written as a Cartesian Product (×) followed by a Selection (σ).
• PROJECT ⋈ PROJECT.Dnum = DEPARTMENT.Dnumber DEPARTMENT
≡ σ(PROJECT.Dnum = DEPARTMENT.Dnumber)(PROJECT × DEPARTMENT)
• You first pair all combinations of tuples (like in a cross join), then filter only those
where Dnum = Dnumber.
• Multi-way Joins (n-way JOIN)
• You can chain multiple JOINs:
• ((PROJECT ⋈ Dnum=Dnumber DEPARTMENT) ⋈ Mgr_ssn=Ssn
EMPLOYEE)
• This combines:
• Each PROJECT with its controlling DEPARTMENT
• Then adds the EMPLOYEE who manages that department
• This gives a single record that includes:
• Project Info
• Department Info
• Manager Info
• NATURAL JOIN without Renaming
• If both relations already have attributes with the same name (e.g.,
Dnumber), NATURAL JOIN works without renaming.
• DEPT_LOCS ← DEPARTMENT ⋈ DEPT_LOCATIONS
• 6. JOIN Selectivity
• If:
• R has nR tuples
• S has nS tuples
• Then:
• Cartesian Product: nR × nS tuples
• JOIN: between 0 and nR × nS tuples
• Join Selectivity = Actual Join Result / (nR × nS)
• Low selectivity = fewer matches, high selectivity = more matches.
SQL Implementation of JOINs
• Method 1: Join in WHERE clause
• SELECT * FROM PROJECT, DEPARTMENT WHERE PROJECT.Dnum =
DEPARTMENT.Dnumber;
• Method 2: Explicit JOIN
• SELECT * FROM PROJECT JOIN DEPARTMENT ON PROJECT.Dnum =
DEPARTMENT.Dnumber;
• JOINs in Relational Algebra Core Set
• The basic relational algebra operations are:
• σ (Selection)
• π (Projection)
• × (Cartesian Product)
• ρ (Rename)
• ∪ (Union)
• – (Difference)
• JOIN is not strictly necessary—it can be built from:
• R ⋈condition S ≡ σ<condition>(R × S)
• NATURAL JOIN from Basic Ops:
• Rename to avoid name conflicts
• Cartesian Product
• Selection on matching attributes
• Projection to remove duplicates
• INNER JOIN
• Definition:
• An INNER JOIN returns only the rows where there is a match in both
tables based on the join condition.
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• INNER JOIN DEPARTMENT D ON E.DeptID = D.DeptID;
• 2. OUTER JOIN
• Definition:
• OUTER JOIN includes matching rows and the non-matching rows
from one or both tables, filling in NULLs for missing values.
• Types of Outer Joins:
• LEFT OUTER JOIN: All rows from the left table + matching rows from
the right.
• RIGHT OUTER JOIN: All rows from the right table + matching rows
from the left.
• FULL OUTER JOIN: All rows from both tables; unmatched rows are
filled with NULLs.
• LEFT OUTER JOIN Example:
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• LEFT OUTER JOIN DEPARTMENT D ON E.DeptID = D.DeptID;
• RIGHT OUTER JOIN Example:
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• RIGHT OUTER JOIN DEPARTMENT D ON E.DeptID = D.DeptID;
• Includes all records from both tables. Where there is no match, NULLs are filled in.
• 3. CROSS JOIN (Cartesian Product)
• Definition:
• A CROSS JOIN returns the Cartesian product of two tables—every
row from the first table is combined with every row from the
second.
• SELECT E.Name, D.DeptName
• FROM EMPLOYEE E
• CROSS JOIN DEPARTMENT D;