Module-5-3
Module-5-3
~ -
-
Part 1 - Syllabus
SQL DML (Data Manipulation Language)
● SQL queries on single and multiple tables
● Nested queries (correlated and non-correlated)
● Aggregation and grouping
● Views
● Assertions
● Triggers
● SQL data types
What We’ll Cover in This Session:
JOIN)
4. Practical Examples and Exercises
What is SQL DML?
~ FROM table_name -
VWHERE condition -v
ORDER BY column ASC|DESC;
Key Components:
● SELECT: Specifies the columns to retrieve.
● FROM: Specifies the table(s) to query. From
● WHERE: Filters rows based on conditions. ~
● ORDER BY: Sorts the result set.
Example - Querying a Single Table
~ ~
v
~
SELECT EmployeeID, FirstName, LastName, Salary
FROM Employees ~
~
WHERE Salary > 50000
ORDER BY Salary DESC;
~
Output Explanation: ~ - -
Problem:
Data is often spread across multiple tables (e.g., Employees and Departments).
Solution:
Use JOINs to combine data from two or more tables based on related columns. -
Types of JOINs:
1. INNER JOIN : Returns matching rows from both tables.
2. LEFT JOIN : Returns all rows from the left table and matching rows from the right table.
3. RIGHT JOIN : Returns all rows from the right table and matching rows from the left
table.
4. FULL OUTER JOIN : Returns all rows when there is a match in either table.
INNER JOIN Example
Scenario:
Retrieve employee names and their department names.
Tables:
●
D
Employees: Contains EmployeeID, FirstName, LastName, DepartmentID. &
● Departments: Contains DepartmentID, DepartmentName.
Query: -
-
-
~ T
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName
-
FROM Employees
- -
mem
W ON Employees.DepartmentID = Departments.DepartmentID;
-
-- ↑
- ↑
-
Output Explanation:
● Combines data from Employees and Departments using DepartmentID.
● Only matching rows are returned.
n·
LEFT JOIN Example
Scenario:
Retrieve all employees and their department names (even if some employees don’t belong to a
department).
- ~
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName
- -
-
FROM Employees
- -
-
-
Output Explanation: -
B
Practical Exercise
Task 1:
Write a query to retrieve all employees with salaries greater than 60000, sorted by salary in ascending
order. ~ -
-
-
Task 2:
Write a query to retrieve all departments and the number of employees in each department.
Hint: - -
Next Steps: -
Subqueries ~
5. Practical Exercises
-
What are Nested Queries?
Definition:
● A nested query (or subquery) is a query embedded within another SQL
query.
● It allows you to break down complex problems into smaller, logical steps.
Use Cases:
● Retrieve data based on conditions derived from another query.
● Perform calculations or comparisons using intermediate results. ~
Types of Nested Queries:
1. Non-Correlated Subqueries : Independent of the outer query.
2. Correlated Subqueries : Dependent on the outer query.
Non-Correlated Subqueries
-
Definition:
-
lit
-↑ &
● A non-correlated subquery executes independently of the outer query.
● The result of the subquery is computed once and used by the outer query.
Syntax Example:
~ ~
E
SELECT column1, column2
-
FROM table_name
WHERE column1 = (SELECT column1 FROM another_table WHERE condition);
Key Characteristics: ↑ - f
● Executes first and returns a single value or set of values.
● Outer query uses the result as a condition.
f
Example - Non-Correlated Subquery
Scenario:
Find employees whose salary is greater than the average salary.
Query:
-
- -
Definition:
● A correlated subquery depends on the outer query for its values.
● It executes once for each row processed by the outer query.
Syntax Example:
SELECT column1, column2
FROM table_name AS outer_table
WHERE column1 = (SELECT column1 FROM another_table WHERE condition AND
outer_table.column = another_table.column);
Key Characteristics:
● Executes repeatedly, once for each row in the outer query.
● Often used for row-by-row comparisons.
Example - Correlated Subquery
Scenario:
-
Find employees who earn more than the average salary in their department.
Query:
2
SELECT EmployeeID, FirstName, LastName, Salary, DepartmentID
FROM Employees AS E1
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees AS E2
WHERE E1.DepartmentID = E2.DepartmentID
);
Explanation:
1. For each employee (E1), the subquery calculates the average salary of employees in the
same department (E2).
2. The outer query retrieves employees whose salary exceeds this average.
Differences Between Correlated and Non-Correlated Subqueries
d ↓
~ ~
~
~
~ -
Practical Exercise
Task 1:
Write a query to find employees whose salary is greater than the highest salary in the
"Sales" department.
-
- -
Hint:
Use a non-correlated subquery to find the maximum salary in the "Sales" department.~
Task 2:
-
Write a query to find departments where the total salary of employees exceeds 500,000.
Hint:
- -
Use a correlated subquery to calculate the total salary for each department.
Recap and Key Takeaways Session 3.2
Definition:
● Aggregation involves summarizing or combining data into a single value or smaller
set of values.
-
-
● It’s commonly used to calculate totals, averages, counts, etc., from large datasets.
Key Aggregation Functions in SQL:
1. COUNT: Counts the number of rows. -
-
2. SUM: Calculates the total of numeric values.
3. AVG: Computes the average of numeric values. -
4. MIN: Finds the smallest value. -
5. MAX: Finds the largest value.-
Example - Basic Aggregation
Scenario:
Find the total salary paid to all employees.
Query:
SELECT SUM(Salary) AS TotalSalary -
FROM Employees;
-
Explanation:
● SUM(Salary) calculates the total salary of all employees.
AS TotalSalary renames the result column for better readability.
●
~
Grouping Data with GROUP BY -
Definition:
● The GROUP BY clause groups rows that have the same values in specified columns into summary
rows. ~
● It’s often used with aggregation functions to perform calculations on each group. ~
Syntax Example: - ~
GROUP BY column1;
-
Example:
Find the total salary paid in each department.
Query: ~
SELECT DepartmentID, ~
SUM(Salary) AS TotalSalary -
-
FROM Employees -
~
GROUP BY DepartmentID; ~
Explanation: -
● Groups employees by DepartmentID.
● Calculates the total salary for each department. ~
Filtering Groups with HAVING
Definition:
● The HAVING clause filters groups based on conditions, similar to how WHERE
filters rows.
● It’s used after GROUP BY to apply conditions to aggregated results.
Syntax Example: ~ ~
HAVING condition; -
Example:
Find departments where the total salary exceeds 500,000.
Filtering Groups with HAVING(Continued)
Query: ~ ~
~
FROM Employees
GROUP BY DepartmentID -
Explanation:
● Groups employees by DepartmentID. ~ ~
● Filters only those departments where the total salary exceeds 500,000.
-
Combining GROUP BY and HAVING
Scenario:
Find departments with more than 10 employees and an average salary greater than 60,000.
- -
Query: -
~ ~ ~
SELECT DepartmentID, COUNT(*) AS EmployeeCount, AVG(Salary) AS AvgSalary
-
= -
FROM Employees -
GROUP BY DepartmentID ~
HAVING COUNT(*) > 10 AND AVG(Salary) > 60000;
-- &
- -
-
Explanation:
● Groups employees by DepartmentID .
● Counts the number of employees (COUNT(*) ) and calculates the average salary
(AVG(Salary) ) for each department.
● Filters only departments with more than 10 employees and an average salary > 60,000.
Practical Exercise
Task 1: ~
~
Write a query to find departments where the minimum salary is less than 40,000.
Hint: -
Questions? Y
Topic 3.4: Views, Assertions (with Examples)
Module 3
What We’ll Cover in This Session 3.4
1. Introduction to Views
● What are Views? -
● Creating and Using Views ~
2. Types of Views
● Simple Views -
● Complex Views
3. Assertions in SQL -
● What are Assertions?
● Syntax and Examples
4. Practical Examples and Exercises
Grade
Creating and Using Views
Syntax: -
--
CREATE VIEW view_name AS
=
SELECT column1, column2, ...
FROM table_name
WHERE condition;
~
Example: -
Create a view to retrieve employee names and their departments.
Query:
=
↑
CREATE VIEW EmployeeDepartmentView AS
&
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName -
FROM Employees -
INNER JOIN Departments -
ON Employees.DepartmentID = Departments.DepartmentID; -
Using the View:
SELECT * FROM EmployeeDepartmentView;
-
mem
-
Explanation:
● The CREATE VIEW statement defines the view.
● The view can then be queried like a regular table.
Types of Views
Simple Views:
● Based on a single table.s
● Do not involve complex operations like joins or aggregations ~
Example :
-
-
-
- -
CREATE VIEW HighSalaryEmployees AS
SELECT FirstName, LastName, Salary
FROM Employees
WHERE Salary > 60000;
&
Types of Views
Complex Views:
● Involve multiple tables, joins, or aggregations.
●
Example :
-
CREATE VIEW DepartmentSalarySummary AS
SELECT DepartmentID, COUNT(*) AS EmployeeCount, AVG(Salary) AS
- -
AvgSalary
FROM Employees
GROUP BY DepartmentID;
-
Modifying and Dropping Views
Modifying a View:
-
Use CREATE OR REPLACE VIEW to update the definition of an existing view.
-
Example:
~ -
CREATE OR REPLACE VIEW HighSalaryEmployees AS
[
SELECT FirstName, LastName, Salary
FROM Employees
WHERE Salary > 70000;
Dropping a View:
Use DROP VIEW to delete a view. ~
Example:
-
DROP VIEW HighSalaryEmployees;
~
What are Assertions?
Definition:
● An Assertion is a database constraint that ensures a condition is always true across the entire database.
● If the condition becomes false, the database rejects the operation that caused the violation.
- -
Key Characteristics:
● Assertions are global constraints that apply to multiple tables.
● They are rarely supported directly in most RDBMS systems (e.g., MySQL does not support assertions).
Syntax (Theoretical):
~
CREATE ASSERTION assertion_name
CHECK (condition);
~
Example of Assertions
Scenario:
Ensure that no department has more than 20 employees. -
Query (Theoretical):
~
CREATE ASSERTION MaxEmployeesPerDepartment
CHECK (C
NOT EXISTS ( -
SELECT DepartmentID
FROM Employees
GROUP BY DepartmentID
HAVING COUNT(*) > 20 ~
)
);
Explanation:
● The assertion checks that no department exceeds 20 employees.
● If a transaction violates this condition, it will be rejected. j
-
Note:
Assertions are not widely implemented in modern RDBMS systems. Instead, similar functionality can often be achieved using triggers or application-level logic.
Practical Exercise
Task 1:
Hint: ----
Create a view to retrieve all employees with salaries greater than 50,000.
Module 3
What We’ll Cover in This Session 3.5
:
1. Introduction to Triggers ↑
● What are Triggers? - -
Definition: ~
Key Characteristics:
- -
·
Trigger Syntax
Syntax: ~
CREATE TRIGGER trigger_name
{BEFORE | AFTER} {INSERT | UPDATE | DELETE}
=
=
ON table_name
~FOR EACH ROW
-
~ BEGIN
F -- Trigger logic here
~Explanation of Components:
END;
Scenario:
Log every deletion from the Employees table into a DeletedEmployees table.
Query:
~ CREATE TRIGGER LogEmployeeDeletion ~
~ AFTER DELETE ON Employees
↓
.....
INSERT INTO DeletedEmployees (EmployeeID, FirstName, LastName,
DeletionDate) .....
VALUES (OLD.EmployeeID, OLD.FirstName, OLD.LastName, NOW());
END;
Explanation:
● The trigger fires after a row is deleted from the Employees table.
● It inserts the deleted row's details into the DeletedEmployees table, along with the current timestamp (NOW()).
Trigger Timing Options
ope
● AFTER: Executes the trigger logic after the triggering event. ~
Example - BEFORE Trigger:
Prevent inserting employees with salaries below 30,000.
Query:
CREATE TRIGGER CheckMinSalary - S
~ BEFORE INSERT ON Employees
~
FOR EACH ROW
BEGIN
IF NEW.Salary < 30000 THEN
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'Salary must be at least 30,000';
END IF;
~ END;
Explanation:
● The trigger checks the salary before insertion. ~
● If the condition fails, it raises an error using SIGNAL. ~
~
SQL Data Types
acti
Definition:
● Data Types define the kind of data that can be stored in a column. ~
Egor
● Choosing the right data type ensures efficient storage and prevents errors. ~
Common SQL Data Types:
1.
2.
Numeric: ~
●
String: &
~
%
- ↑
3. Date/Time: -
-
● Use INT for whole numbers, DECIMAL for precise decimals.
↑
-
● Use VARCHAR for variable-length strings, CHAR for fixed-length strings.
&
Scenario:
Create a table to store employee details with appropriate data types.
Query: ↑
-FirstName VARCHAR(50), ~
- LastName VARCHAR(50), ~
-
Salary DECIMAL(10, 2), ~
~
HireDate DATE, ~
-IsActive BOOLEAN ~
);
Explanation: -
● EmployeeID : Integer for unique identification. -
● FirstName and LastName : Variable-length strings.
-
● Salary : Decimal with precision up to 2 decimal places.
● HireDate : Stores the date of hiring. -
● IsActive : Boolean to indicate active status. ↑
Practical Exercise
Task 1: >
-
Create a trigger that logs updates to the Salary column in the Employees table into a SalaryChangeLog table.
Hint: -
Use an AFTER UPDATE trigger and insert the old and new salary values into the log table.
Task 2:
Design a table to store customer information with appropriate data types. Include fields for CustomerID , Name , Email , Phone ,
and RegistrationDate .
Hint:
Choose numeric, string, and date/time data types as needed.
Recap and Key Takeaways of Session 3.5
-
1. Triggers automate actions based on events like INSERT , UPDATE , or DELETE .
2. Triggers can enforce business rules, log changes, or maintain data integrity.
3. SQL Data Types define the kind of data stored in columns. Choosing the right type ensures
--
efficiency and accuracy.
4. Use triggers and data types effectively to build robust databases.
Next Steps: ~ -
● Practice creating triggers and designing tables with appropriate data types.
● Explore advanced topics like indexing in future sessions.
Questions ?
⑪ T
↑Q
Topic 3.6: Review of Terms - Physical and Logical Records, Blocking
Factor, Pinned and Unpinned Organization, Heap Files, Indexing
Module 3
D
What We’ll Cover in This Session:
*
1. Introduction to Physical and Logical Records ~
● Definitions and Differences
~
2. Blocking Factor ~
● What is it? Why is it important?
3. Pinned vs. Unpinned Organization
● Characteristics and Use Cases ~
4. Heap Files
● Structure and Behavior
~
5. Indexing
● Overview and Importance
~
6. Recap and Key Takeaways
·
Physical vs. Logical Records
Definition:
● Logical Record: A record as perceived by the user or application (e.g., a row in a table). ~
● Physical Record: How the logical record is stored on disk (e.g., blocks or pages).
-
~ ~
~
~
~ ~
-
~
Example:
A logical record might be a single employee record, while the physical record could store multiple employee records in a
block on disk. - -
-
Blocking Factor
Definition: ·D m
● The blocking factor is the number of logical records that can fit into a single physical block (or page)
on disk.
52
i
Formula:
Blocking Factor = Block Size / Record Size
-
Importance: ~
● Maximizes disk space utilization.
● Reduces I/O operations by reading/writing multiple records at once.
-
Example: ~
If a block size is 4 KB and each record is 1 KB, the blocking factor is 4 (4 records per block).
- -
- -
Pipp
Pinned vs. Unpinned Organization
Pinned Organization:
● Records are fixed in specific locations on disk.
~
I
e
● Useful for systems requiring predictable access patterns.
Unpinned Organization:
-
● Records can move around on disk (e.g., during reorganization).
● More flexible but may require additional overhead to track record locations.
Comparison: -
~ ~
·
~
#
Heap Files
Definition:
● A heap file is an unsorted collection of records stored on disk.
● Records are inserted without any specific order. ~
Characteristics: I
● Simple to implement. ~
● Fast for insertions since no sorting or indexing is required.
-
- -
Example:
-
A heap file might store employee records in the order they were added, regardless of
their IDs or names. ~
-
~
Indexing
Definition: F
● Indexing is a technique used to improve the speed of data retrieval operations on a database.
● An index is a data structure (e.g., B-tree, hash table) that maps keys to record locations.
-
Types of Indexes:
1. Primary Index: Built on the primary key.
2. Secondary Index: Built on non-primary key columns.
3. Clustered Index: Determines the physical order of data.
4. Non-Clustered Index: Stores a separate structure pointing to data.
Benefits of Indexing:
● Faster query performance. ~
● Enables efficient range queries and sorting. ~
Drawbacks:
~
● Increases storage requirements.
● Slows down insertions and updates due to index maintenance.
~ ~ -
-
Practical Exercise
Task 1:
Calculate the blocking factor-
● Block size = 8 KB
for a system where: T 1
●
Hint:
Record size = 2 KB 8 .
.
Use the formula:
plaintext
Copy
1
Blocking Factor = Block Size / Record Size
Task 2: -
~
Explain why heap files are inefficient for large-scale databases with frequent search operations.
Hint: - -
Consider the sequential scanning process and lack of ordering.
Recap and Key Takeaways 3.6
4. - Heap Files store records without order, making them simple but inefficient for searches.
5. ~ Indexing improves query performance by organizing data for faster retrieval.
Next Steps:
● Explore advanced indexing techniques like B-Trees and hashing in future sessions.
-
Questions?
Topic 3.7: Single-Level Indices, Numerical Examples
Module 3
What We’ll Cover in This Session:
1. Introduction to Indexing
● Why do we need indices?
2. Types of Single-Level Indices
↓
~
● Primary Index
~
● Secondary Index
3. Numerical Examples ~
● Calculating Index Size and Search Efficiency ~
4. Practical Exercises
5. Recap and Key Takeaways
What is Indexing?
Definition:
● An index is a data structure that improves the speed of data retrieval operations on a
database table. ~ - -
● It works like an index in a book, allowing quick access to specific data without scanning
the entire table. ~ -
Definition:
⑭ ~
● A single-level index is an index that uses a single level of entries to map keys to
-
record locations. -
-
~
● It is simpler than multi-level indices but may not scale well for very large datasets.
Types of Single-Level Indices:
-
Definition:
● A primary index is built on the primary key of a table.
● It assumes that records are stored in sorted order by the primary key.
Structure:
● Each entry in the index contains:
● Key value (e.g., primary key).
● Pointer to the block where the record is stored.
Advantages:
● Efficient for range queries.
~
● Reduces the number of disk accesses. -
Example:Suppose we have a table with 1000 records sorted by EmployeeID. The primary index might look like this:
o - w
-
F i
-
Secondary Index
Definition:
● A secondary index is built on non-primary key columns.
● It allows indexing on fields other than the primary key.
Structure:
● Each entry in the index contains: ~
● Key value (e.g., a secondary column like LastName).
● Pointer to the record location.
Advantages: -
● Enables fast searches on non-primary key columns.
Y
● Useful for tables with multiple search criteria.
Disadvantages:
● Requires additional storage. ~
Y
● May slow down insertions and updates.
Example:Suppose we want to index employees by LastName. The secondary index might look like this:
-
Numerical Example - Primary Index
Scenario:
The same file has a secondary index on LastName, with each index entry being 20 bytes. Calculate:
F
1. Number of index entries.
E
2. Size of the secondary index.
Solution:
1.Number of Index Entries: -
Task 1:
A file contains 5000 records, each of size 200 bytes. The block size is 4 KB. Calculate:
1. Blocking factor. -
2. Number of blocks needed to store the file.
0
3. Size of the primary index if each index entry is 12 bytes.
Task 2:
-
If the same file has a secondary index on DepartmentID , with each index entry being 18 bytes, calculate the size of the
secondary index.
Topic 3.8: Multi-Level Indices, Numerical Examples
What We’ll Cover in This Session:
1. Introduction to Multi-Level Indices ~
● Why do we need multi-level indices? ~
2. Structure of Multi-Level Indices~
● How they work and their advantages ~
3. Numerical Examples ~
● Calculating index levels and search efficiency
4. Practical Exercises
~
5. Recap and Key Takeaways
6.
What are Multi-Level Indices?
⑰
Definition:
● A multi-level index is an indexing technique that uses multiple levels of indices to map keys to record locations.
=
● It is used to overcome the limitations of single-level indices when dealing with very large datasets.
Why Use Multi-Level Indices?
● Reduces the number of disk accesses required for searching.
● Scales better than single-level indices for large datasets.
Key Characteristics:
I
● The top level contains pointers to the next level.
● The bottom level points to actual data blocks.
V
-
-
o
Structure of Multi-Level Indices
O
a
Explanation:
● A multi-level index is like a tree structure where each level reduces the search space.
-
Example: -
Suppose we have a file with 1 million records. A multi-level index might look like this: -
-
~
~
-
Advantages of Multi-Level Indices
1. Efficient Search:
● Reduces the number of disk accesses by narrowing down the search space at each level.
-
2. Scalability:
● Handles very large datasets more effectively than single-level indices.
3. Flexibility:
● Can be combined with other indexing techniques like B-Trees.
Disadvantages:
Y
● Increased storage requirements due to multiple levels.
● More complex to implement and maintain.
~
~
Numerical Example - Multi-Level Index
Scenario:
A file contains 1,000,000 records, each of size 200 bytes. The block size is 4 KB. Each index entry
Fi
-
-
is 12 bytes. Calculate: - -
-
-
1. Blocking factor for data blocks.
2. Number of blocks needed to store the file. -
3. Number of levels in the multi-level index. -
Solution: ~
-
1.Blocking Factor: -
Blocking Factor = Block Size / Record Size = 4096 / 200 ≈ 20 records per block
- - --
2.Number of Blocks: ~ -
Number of Blocks = Total Records / Blocking Factor = 1,000,000 / 20 = 50,000 blocks
Numerical Example - Multi-Level Index
-
--
Scenario:
1. Number of levels in the multi-level index. ~
Solution 3:
Number of Levels in Multi-Level Index:
Each index block can hold: ~
⑭
Entries per Index Block = Block Size / Entry Size = 4096 / 12 ≈ 341 entries
-
Bottom Level: Points to 50,000 data blocks. Requires:
Blocks in Bottom Level = 50,000 / 341 ≈ 147 blocks
Intermediate Level: Points to 147 blocks. Requires:
-
Blocks in Intermediate Level = 147 / 341 ≈ 1 block
-
F
Scenario:
-
How many disk accesses are required to retrieve a record using a multi-level index?
Explanation:
-
● Each level reduces the search space. ~
-
● For the previous example:
● Access the root block (1 access).
● Access the intermediate block (1 access).
● Access the bottom-level block (1 access).
● Access the data block (1 access). ~
DeDe
Total Disk Accesses: 4 ~
Comparison with Single-Level Index:
~
● Single-level index would require accessing all 50,000 blocks in the worst case.
-
&
Practical Exercise
Task 1:
... - -
0
A file contains 2,000,000 records, each of size 150 bytes. The block size is 8 KB. Each index entry is 16 bytes. Calculate:
-
1. Blocking factor for data blocks.
2. Number of blocks needed to store the file.
3. Number of levels in the multi-level index.
Task 2:
If the same file uses a single-level index, how many disk accesses are required in the worst case? Compare it with the
multi-level index.
Recap and Key Takeaways
--
3. Numerical Examples demonstrate how multi-level indices improve search efficiency.
4. Multi-level indices trade off increased storage for faster retrieval.
Next Steps:
● Explore advanced indexing techniques like B-Trees and hashing in future sessions.
Questions?
Thank you