0% found this document useful (0 votes)
7 views

SQL Material

The document compares relational and non-relational databases, highlighting their use cases: relational databases are suited for structured data with defined relationships, while non-relational databases excel in handling unstructured data with varied attributes. It also explains SQL commands categorized into DDL, DML, and DQL, providing examples for creating, manipulating, and querying data. Additionally, it covers advanced SQL queries, aggregate functions, and data control language commands for managing database access and permissions.

Uploaded by

Abhishek Dutta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

SQL Material

The document compares relational and non-relational databases, highlighting their use cases: relational databases are suited for structured data with defined relationships, while non-relational databases excel in handling unstructured data with varied attributes. It also explains SQL commands categorized into DDL, DML, and DQL, providing examples for creating, manipulating, and querying data. Additionally, it covers advanced SQL queries, aggregate functions, and data control language commands for managing database access and permissions.

Uploaded by

Abhishek Dutta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Relational vs Non-Relational Database:

Scenario 1: Relational Database

Use Case: Managing Orders

When customers place orders, each order has a well-defined structure: an order ID, customer ID, list
of products, quantities, prices, order date, and so on. Here, we also have multiple relationships to
track — for example:

Each customer can place multiple orders.

Each order can include multiple products.

Because of these structured relationships, a relational database (like MySQL, PostgreSQL, or Oracle)
is ideal. This database can use tables to store data:

Customer table for customer information.

Order table to store each order's details.

Product table for product information.

Order_Product table to handle the many-to-many relationship between orders and products.

With SQL queries, we can easily retrieve data like, “Show all orders by a specific customer” or “List
products purchased within a date range.” The structure and relationships make relational databases
highly reliable and efficient for such structured, well-defined data.

Scenario 2: Non-Relational Database

Use Case: Storing Product Information

Our products may have varied attributes — some might have size and color, while others might have
expiration dates or manufacturer information. Since product attributes can vary widely, it’s hard to
fit this data into a fixed table structure without lots of empty or unnecessary fields.

A non-relational database (like MongoDB or Cassandra) is flexible enough to store this kind of
unstructured data. We can store each product as a document with its own unique fields. For
example:

A T-shirt document might have fields for size, color, and material.

A Food item document might have expiration_date, ingredients, and manufacturer.

With this approach, each product can be represented in a way that best suits its specific data,
without rigid schema requirements. This flexibility is ideal for scenarios where data formats vary or
evolve over time.

DDL, DML, DQL Scenario based explanation:


• DDL (Data Definition Language)

• It is a type of SQL command used to define data structures and modify data. It
creates, alters, and deletes database objects such as tables, views, indexes, and
users.

• Examples of DDL statements include CREATE, ALTER, DROP and TRUNCATE.

• DML (Data Manipulation Language)

• It is a type of SQL command used to manipulate data in a database. It inserts,


updates, and deletes data from a database table.

• Examples of DML statements include INSERT, UPDATE, and DELETE.

• Data Query Language (DQL)

• It is used to retrieve data from a database. The main DQL command is SELECT,
which allows you to query or search through tables to find specific information.

• Example: SELECT query

=================================================================================

Example Dataset with All Scenarios

a sample dataset for a hospital management system with the following tables: Patients,
Appointments, Doctors, and Departments. We’ll go through several business scenarios, posing
questions that might arise in each, and provide example SQL queries in DDL, DML, and DQL.

DDL Commands for given tables:


CREATE TABLE Patients (

patient_id INT PRIMARY KEY,

first_name VARCHAR(50),

last_name VARCHAR(50),

date_of_birth DATE,

gender VARCHAR(10),

phone VARCHAR(15),

address VARCHAR(100)

);

CREATE TABLE Departments (

department_id INT PRIMARY KEY,


department_name VARCHAR(100),

location VARCHAR(100)

);

CREATE TABLE Doctors (

doctor_id INT PRIMARY KEY,

first_name VARCHAR(50),

last_name VARCHAR(50),

specialty VARCHAR(50),

department_id INT,

FOREIGN KEY (department_id) REFERENCES Departments(department_id)

);

CREATE TABLE Appointments (

appointment_id INT PRIMARY KEY,

patient_id INT,

doctor_id INT,

appointment_date DATE,

appointment_time TIME,

status VARCHAR(20),

FOREIGN KEY (patient_id) REFERENCES Patients(patient_id),

FOREIGN KEY (doctor_id) REFERENCES Doctors(doctor_id)

);

Alter Table:
Add a new column to the Patients table to store email addresses for each patient

ALTER TABLE Patients

ADD email VARCHAR(100);

How to Insert data to new column?

UPDATE persons SET salary = '145000' where Emp_Id=12;

Increase the length of the phone column in the Patients table to store international phone
numbers.

alter table Patients

alter column phone VARCHAR(20);

Rename the location column in the Departments table to department_location for clarity

EXEC sp_rename 'Departments.department_location', 'department_location', ’COLUMN’

DML Commands for How to insert records into the table:


Inserting data to above tables:

Syntax:
INSERT INTO Departments (col1,col2 ,col3) VALUES(value1,value2,value3)

INSERT INTO Departments (department_id, department_name, location) VALUES

(1, 'Cardiology', 'First Floor'),

(2, 'Neurology', 'Second Floor'),

(3, 'Orthopedics', 'Third Floor'),

(4, 'Pediatrics', 'Ground Floor'),

(5, 'Oncology', 'Second Floor'),

(6, 'Radiology', 'Basement'),

(7, 'Emergency', 'Ground Floor'),

(8, 'Dermatology', 'Fourth Floor'),

(9, 'Gastroenterology', 'Third Floor'),

(10, 'Urology', 'Fourth Floor');


INSERT INTO Doctors (doctor_id, first_name, last_name, specialty, department_id) VALUES

(101, 'John', 'Doe', 'Cardiology', 1),

(102, 'Jane', 'Smith', 'Neurology', 2),

(103, 'Emily', 'Brown', 'Orthopedics', 3),

(104, 'Michael', 'Johnson', 'Pediatrics', 4),

(105, 'Anna', 'Taylor', 'Oncology', 5),

(106, 'David', 'Wilson', 'Radiology', 6),

(107, 'Laura', 'Martinez', 'Emergency', 7),

(108, 'Chris', 'Lee', 'Dermatology', 8),

(109, 'Sara', 'Lopez', 'Gastroenterology', 9),

(110, 'Tom', 'Clark', 'Urology', 10);

INSERT INTO Patients (patient_id, first_name, last_name, date_of_birth, gender, phone, address)
VALUES

(1, 'Alice', 'Williams', '1985-05-12', 'Female', '123-456-7890', '123 Maple St'),

(2, 'Bob', 'Johnson', '1970-11-23', 'Male', '987-654-3210', '456 Oak St'),

(3, 'Charlie', 'Davis', '1995-02-18', 'Male', '555-555-5555', '789 Pine St'),

(4, 'Diana', 'Miller', '1988-03-25', 'Female', '222-333-4444', '101 Elm St'),

(5, 'Ethan', 'Clark', '1979-08-15', 'Male', '333-444-5555', '102 Birch St'),

(6, 'Fiona', 'Lopez', '1992-06-30', 'Female', '444-555-6666', '103 Cedar St'),

(7, 'George', 'Brown', '1965-10-10', 'Male', '555-666-7777', '104 Spruce St'),

(8, 'Hannah', 'Wilson', '2000-12-20', 'Female', '666-777-8888', '105 Fir St'),

(9, 'Ian', 'Martinez', '1998-07-18', 'Male', '777-888-9999', '106 Ash St'),

(10, 'Jane', 'Taylor', '1991-09-09', 'Female', '888-999-0000', '107 Willow St');

INSERT INTO Appointments (appointment_id, patient_id, doctor_id, appointment_date,


appointment_time, status) VALUES

(1001, 1, 101, '2024-11-15', '10:00', 'Scheduled'),

(1002, 2, 102, '2024-11-16', '14:30', 'Scheduled'),

(1003, 3, 103, '2024-11-17', '09:00', 'Completed'),

(1004, 4, 104, '2024-11-18', '11:30', 'Scheduled'),


(1005, 5, 105, '2024-11-19', '15:00', 'Canceled'),

(1006, 6, 106, '2024-11-20', '08:30', 'Scheduled'),

(1007, 7, 107, '2024-11-21', '10:45', 'Completed'),

(1008, 8, 108, '2024-11-22', '13:00', 'Scheduled'),

(1009, 9, 109, '2024-11-23', '09:15', 'Scheduled'),

(1010, 10, 110, '2024-11-24', '12:00', 'Scheduled');

Update:
Update the location of the "Radiology" department to "First Floor."

UPDATE Departments

SET department_location = 'First Floor'

WHERE department_name = 'Radiology';

Change the department name of department_id = 10 from "Urology" to "Nephrology."

UPDATE Departments

SET department_name = 'Nephrology'

WHERE department_id = 10;

Relocate all departments currently on the "Second Floor" to the "Third Floor."

UPDATE Departments

SET department_location = 'Third Floor'

WHERE department_location = 'Second Floor';

Update the specialty of "Dr. John Doe" to "Cardiac Surgery."

UPDATE Doctors

SET specialty = 'Cardiac Surgery'

WHERE first_name = 'John' AND last_name = 'Doe';

Assign department_id = 7 to all doctors in the "Emergency" specialty.

UPDATE Doctors

SET department_id = 7

WHERE specialty = 'Emergency';


Delete Statements:
Deleting Records from Appointments

Delete a specific appointment that was mistakenly scheduled (e.g., appointment_id = 1005).

DELETE FROM Appointments

WHERE appointment_id = 1005;

Remove all cancelled appointments from the Appointments table.

DELETE FROM Appointments

WHERE status = 'Canceled';

Delete all past appointments (appointments before today) from the Appointments table.

DELETE FROM Appointments

WHERE appointment_date < CURRENT_DATE;

Deleting Records from Patients

Delete a specific patient record based on their ID (e.g., patient_id = 1).

DELETE FROM Patients

WHERE patient_id = 1;

Remove all patient records where no phone number has been provided.

DELETE FROM Patients

WHERE phone IS NULL;

Remove doctors who are assigned to departments that no longer exist in the Departments table.

DELETE FROM Doctors

WHERE department_id NOT IN (SELECT department_id FROM Departments);


DQL Queries:
Appointment table
List all scheduled appointments.

SELECT * FROM Appointments WHERE status = 'Scheduled';

Find all appointments for a specific doctor (e.g., doctor_id = 101).

SELECT * FROM Appointments WHERE doctor_id = 101;

Manage Patient Records


List all patients born after 1990.

SELECT * FROM Patients WHERE date_of_birth > '1990-01-01';

Retrieve patients’ phone numbers who live on “Maple St”.

SELECT first_name, last_name, phone

FROM Patients

WHERE address LIKE '%Maple St%';

Department Insights:
List all department names and locations.

SELECT department_name, location FROM Departments;

Find the department with the highest number of doctors.


SELECT department_id, COUNT(doctor_id) AS doctor_count

FROM Doctors

GROUP BY department_id

ORDER BY doctor_count DESC

LIMIT 1;
DCL : Data Control Language:
Questions:
Grant read-only access to the Patients and Appointments tables for a user named
receptionist_user

CREATE LOGIN alex


WITH PASSWORD='12345';

CREATE USER alex


FOR LOGIN alex;

GRANT SELECT ON Patients TO alex;

GRANT SELECT ON Patients TO receptionist_user;


GRANT SELECT ON Appointments TO receptionist_user;

Grant a doctor access to view and update their own appointment records. Assume
the doctor’s username is doctor_user.
GRANT SELECT, UPDATE ON Appointments TO doctor_user;

Grant full access to the Doctors table to a department head user, allowing them to
manage doctor records in their department.
GRANT SELECT, INSERT, UPDATE, DELETE ON Doctors TO department_head_user;

Grant limited access to an external auditor to view only completed appointments


through a view.
CREATE VIEW CompletedAppointments AS
SELECT * FROM Appointments WHERE status = 'Completed';
GRANT SELECT ON CompletedAppointments TO auditor_user;

Order of Execution
Objective: Retrieve a list of doctors with the total number of completed appointments in 2024, only
including doctors who had at least 5 completed appointments in this period. The results should be
ordered by the total number of appointments in descending order.

SELECT

doctor_id,

COUNT(appointment_id) AS total_completed_appointments

FROM

Appointments

WHERE

status = 'Completed'

AND YEAR(appointment_date) = 2024

GROUP BY

doctor_id

HAVING

COUNT(appointment_id) >= 5

ORDER BY

total_completed_appointments DESC;

Explanation of Execution Order


In SQL, the execution order of a query doesn’t necessarily follow the order of clauses in the written
syntax. Here's the actual execution order of the clauses in this query:

FROM: The query starts by identifying the table from which to retrieve data (Appointments).

WHERE: The WHERE clause filters the rows based on the condition specified. Only rows where status
= 'Completed' and appointment_date is in the year 2024 are selected.

GROUP BY: After filtering, the GROUP BY clause groups the rows by doctor_id, so we can perform
aggregate functions on each group.

HAVING: The HAVING clause applies a filter to the groups created by GROUP BY. In this case, it keeps
only those groups (i.e., doctors) with a COUNT of appointment_id that is 5 or more.

SELECT: The SELECT clause is processed next. It retrieves doctor_id and the calculated
total_completed_appointments (the result of COUNT(appointment_id)).

ORDER BY: Finally, the results are sorted by total_completed_appointments in descending order.

Final Output

The result will be a list of doctors who have completed at least 5 appointments in 2024, with each
doctor’s ID and their total completed appointments, sorted from the highest to the lowest count.

This demonstrates SQL’s logical execution order, which is important for understanding how filters
and aggregations apply to data.
3. Advanced SQL Queries
Complex Select Queries:
Scenario: Retrieve the patient name, doctor name, appointment date, and department for all
scheduled appointments, sorted by department and appointment date. If the appointment has a
note, include it; otherwise, show "No Notes Available."

SELECT

P.patient_name AS Patient,

D.doctor_name AS Doctor,

A.appointment_date AS Appointment_Date,

Dept.department_name AS Department,

COALESCE(A.notes, 'No Notes Available') AS Notes

FROM

Appointments A

JOIN

Patients P ON A.patient_id = P.patient_id

JOIN

Doctors D ON A.doctor_id = D.doctor_id

JOIN

Departments Dept ON D.department_id = Dept.department_id


WHERE

A.status = 'Scheduled'

ORDER BY

Dept.department_name, A.appointment_date;

Explanation for above query:

Combines multiple tables using JOIN.

Uses COALESCE to handle NULL values in the notes column.

Filters only "Scheduled" appointments and sorts them by department and date.

SELECT DISTINCT
Scenario: Find all unique appointment statuses recorded in the hospital.

SELECT DISTINCT status

FROM Appointments;

Explanation:

The DISTINCT keyword ensures that each unique value of status appears only once in the result.

Using Aliases (AS):


Scenario: Show a simplified list of appointments with the patient's name, the doctor they are seeing,
and the date of the appointment. Use aliases to give descriptive names to columns.

SELECT

P.patient_name AS Patient_Name,

D.doctor_name AS Doctor_Name,

A.appointment_date AS Appointment_Date

FROM

Appointments A

JOIN

Patients P ON A.patient_id = P.patient_id

JOIN
Doctors D ON A.doctor_id = D.doctor_id;

Explanation:

AS assigns temporary names (aliases) to columns for better readability in the output.

Working with NULL Values


Scenario: Find all appointments where the notes field is not provided (i.e., NULL).

SELECT

appointment_id AS Appointment_ID,

patient_id AS Patient_ID,

doctor_id AS Doctor_ID,

appointment_date AS Appointment_Date

FROM

Appointments

WHERE

notes IS NULL;

Aggregate Functions and Grouping


COUNT, SUM, AVG, MIN, MAX
Definition:
These are aggregate functions used to perform calculations on a set of rows and return a
single value.

COUNT: Counts the number of rows or non-NULL values.


SUM: Calculates the total of numeric values.

AVG: Returns the average of numeric values.

MIN: Finds the smallest value.

MAX: Finds the largest value.

Scenario:

The hospital administration wants to understand key metrics for completed appointments,
including:

Total number of completed appointments.

Total revenue generated from these appointments.

The average fee charged per appointment.

The minimum and maximum fees recorded.

SELECT

COUNT(appointment_id) AS Total_Completed_Appointments,

SUM(fee) AS Total_Revenue,

AVG(fee) AS Average_Fee,

MIN(fee) AS Minimum_Fee,

MAX(fee) AS Maximum_Fee

FROM

Appointments

WHERE

status = 'Completed';

Explanation:

COUNT counts the number of completed appointments.


SUM adds up all the fee values.

AVG calculates the average of fee.

MIN and MAX return the smallest and largest fees respectively.

Useful for summarizing financial and operational performance.

GROUP BY Clause
Definition:

GROUP BY organizes rows into groups based on a column’s value.

Used with aggregate functions to perform calculations for each group separately.

Scenario: The hospital management wants to know how many appointments each doctor has
handled, grouped by their name.

SELECT

D.doctor_name AS Doctor_Name,

COUNT(A.appointment_id) AS Total_Appointments

FROM

Appointments A

JOIN

Doctors D ON A.doctor_id = D.doctor_id

GROUP BY

D.doctor_name;

Explanation:
● GROUP BY groups data by doctor names.
● COUNT calculates the number of appointments for each doctor.
● This is useful for understanding workload distribution among doctors.

Having Clause:
Definition:

● HAVING filters groups created by GROUP BY, based on aggregate values.

● Similar to WHERE, but works on grouped (aggregated) data.

Scenario : Management wants to identify doctors who have handled more than 10 appointments so
far.

SELECT

D.doctor_name AS Doctor_Name,

COUNT(A.appointment_id) AS Total_Appointments

FROM

Appointments A

JOIN

Doctors D ON A.doctor_id = D.doctor_id

GROUP BY

D.doctor_name

HAVING

COUNT(A.appointment_id) > 10;

Explanation:
● GROUP BY groups data by each doctor’s name.

● HAVING filters groups to include only doctors with more than 10 appointments.

● Useful for identifying top performers or resource allocation.

Combined Example: COUNT, SUM, GROUP BY, and HAVING


Business Scenario: Department-Wise Revenue Analysis The hospital’s finance team wants to find
departments that have generated a total revenue exceeding $10,000, along with the number of
appointments handled.

SELECT

Dept.department_name AS Department_Name,

COUNT(A.appointment_id) AS Total_Appointments,

SUM(A.fee) AS Total_Revenue

FROM

Appointments A

JOIN

Doctors D ON A.doctor_id = D.doctor_id

JOIN

Departments Dept ON D.department_id = Dept.department_id

GROUP BY

Dept.department_name

HAVING

SUM(A.fee) > 10000;


Explanation:

● GROUP BY groups data by department names.

● SUM calculates the total revenue per department.

● COUNT counts the appointments per department.

● HAVING filters out departments where total revenue is ≤ $10,000.

● Useful for financial analysis and performance tracking.

When to Use These Clauses:


COUNT, SUM, AVG, MIN, MAX:

To get summarized metrics, like total revenue, average consultation time, or the range of fees.

GROUP BY Clause:

To break data into categories or groups (e.g., by doctor, department, or patient demographics).

HAVING Clause:

To filter results of grouped data based on aggregate values, such as identifying departments with
high revenue or doctors with many appointments.

JOINS:

Below are the insert queries to perform joins:

Patients Table
INSERT INTO Patients (patient_id, first_name,last_name, gender, age) VALUES

(1, 'Alice','Williams', 'Female', 29),

(2, 'Bob','Johnson', 'Male', 45),

(3, 'Charlie','Smith', 'Male', 33),


(4, 'Daisy','Brown', 'Female', 40),

(5, 'Eve','Black', 'Female', 27);

Appointments Table

ALTER TABLE Appointments ADD fee INT;

INSERT INTO Appointments (appointment_id, patient_id, doctor_id, appointment_date, status, fee)


VALUES

(1001, 1, 101, '2024-11-10', 'Scheduled', 200),

(1002, 2, 102, '2024-11-12', 'Completed', 150),

(1003, 3, 103, '2024-11-13', 'Canceled', NULL),

(1004, NULL, 104, '2024-11-15', 'Scheduled', 250),

(1005, 5, NULL, '2024-11-16', 'Completed', 300);

Doctors Table
ALTER TABLE Doctors ADD experience_yearsINT;

INSERT INTO Doctors (doctor_id, first_name, department_id, experience_years) VALUES

(101, 'Dr. John Doe', 1, 10),

(102, 'Dr. Emily Brown', 2, 8),

(103, 'Dr. Anna White', 1, 15),

(104, 'Dr. David Grey', 3, 5);

1. INNER JOIN
Definition:

● An INNER JOIN retrieves rows that have matching values in both tables based on the
specified condition.

Business Scenario: Patients with Scheduled Appointments

The hospital wants a list of patients who have scheduled appointments, including their names,
appointment dates, and fees.

Query:

SELECT
P.patient_name AS Patient_Name,

A.appointment_date AS Appointment_Date,

A.fee AS Appointment_Fee

FROM

Patients P

INNER JOIN

Appointments A ON P.patient_id = A.patient_id

WHERE

A.status = 'Scheduled';

Explanation:

● Combines Patients and Appointments tables where patient_id matches in both tables.

● Filters appointments with a status of "Scheduled."

● Unmatched rows (e.g., appointments without a patient) are excluded.

Example Output:

Patient_Name Appointment_Date Appointment_Fee

Alice Williams 2024-11-10 200

2. LEFT JOIN and RIGHT JOIN


Definition:

● A LEFT JOIN retrieves all rows from the left table and the matching rows from the right table.
Rows from the left table with no match will show NULL for right table columns.

● A RIGHT JOIN is the opposite: all rows from the right table and the matching rows from the
left table.

Business Scenario: All Patients and Their Appointments

The hospital wants to list all patients, including those who haven’t booked any appointments.

Query (LEFT JOIN):

SELECT

P.patient_name AS Patient_Name,

A.appointment_date AS Appointment_Date,

A.status AS Appointment_Status
FROM

Patients P

LEFT JOIN

Appointments A ON P.patient_id = A.patient_id;

Explanation:

● Lists all patients (Patients is the left table).

● Includes NULL for patients without appointments.

Example Output:

Patient_Name Appointment_Date Appointment_Status

Alice Williams 2024-11-10 Scheduled

Bob Johnson 2024-11-12 Completed

Charlie Smith 2024-11-13 Canceled

Daisy Brown NULL NULL

Business Scenario: Appointments Without Patients

The management wants to list all appointments, including those that aren’t associated with any
patient.

Query (RIGHT JOIN):

SELECT

A.appointment_id AS Appointment_ID,

P.patient_name AS Patient_Name,

A.appointment_date AS Appointment_Date

FROM

Patients P

RIGHT JOIN

Appointments A ON P.patient_id = A.patient_id;

Example Output:

Appointment_ID Patient_Name Appointment_Date

1001 Alice Williams 2024-11-10


1004 NULL 2024-11-15

3. FULL OUTER JOIN


Definition:

● A FULL OUTER JOIN retrieves all rows from both tables, matching rows where possible.
Unmatched rows from either table will have NULL values for missing columns.

Business Scenario: All Patients and All Appointments

The hospital wants a complete list of all patients and all appointments, showing matches where
possible.

Query:

SELECT

P.patient_name AS Patient_Name,

A.appointment_date AS Appointment_Date,

A.status AS Appointment_Status

FROM

Patients P

FULL OUTER JOIN


Appointments A ON P.patient_id = A.patient_id;

Example Output:

Patient_Name Appointment_Date Appointment_Status

Alice Williams 2024-11-10 Scheduled

Daisy Brown NULL NULL

NULL 2024-11-15 Scheduled

4. Self-Joins

Definition:

● A SELF-JOIN is when a table is joined to itself. Often used to compare rows within the same
table.

Business Scenario: Manager-Staff Relationships


The hospital wants to know which staff members report to whom.

Query:

SELECT

S1.staff_name AS Staff_Name,

S2.staff_name AS Manager_Name

FROM

Staff S1

LEFT JOIN

Staff S2 ON S1.manager_id = S2.staff_id;

Explanation:

● The Staff table is joined to itself using manager_id and staff_id.


● Useful for hierarchical relationships like reporting structures.

Example Output:

Staff_Name Manager_Name

Laura Smith John Carter

Michael Brown John Carter

Sarah Taylor Michael Brown

5. CROSS JOIN

Definition:

● A CROSS JOIN produces a Cartesian product of two tables, pairing every row from the first
table with every row from the second.

Business Scenario: Doctor-Department Combinations

The hospital wants to see all possible combinations of doctors and departments to explore possible
rotations.

Query:

SELECT

D.doctor_name AS Doctor_Name,

Dept.department_name AS Department_Name

FROM

Doctors D

CROSS JOIN

Departments Dept;

Explanation:

● Every doctor is paired with every department.

● Useful for generating combinations when testing scenarios or generating all possible
options.

Example Output:

Doctor_Name Department_Name

Dr. John Doe Cardiology

Dr. John Doe Orthopedics


Dr. Emily Brown Neurology

Summary

JOIN Type Scenario Purpose

INNER JOIN Match rows in both tables Only include rows with matches.

LEFT JOIN Include all rows from the left table Include unmatched rows from the left table.

RIGHT JOIN Include all rows from the right table Include unmatched rows from the right table.

FULL OUTER Combine matched and unmatched rows from


Include all rows from both tables
JOIN both tables.

SELF-JOIN Compare rows within the same table Explore hierarchical or relationship data.

Generate all combinations between


CROSS JOIN Test scenarios or explore options.
two tables

4. Subqueries and Nested Queries

DDL and DML Queries for these topics and new tables required for it
Patients

INSERT INTO Patients (patient_id, patient_name, gender, age) VALUES

(11, 'Laura Parker', 'Female', 35),

(12, 'Matt Hunter', 'Male', 41),

(13, 'Sophia Allen', 'Female', 30),

(14, 'Ethan Harris', 'Male', 50),

(15, 'Emily Moore', 'Female', 28),

(16, 'Daniel Green', 'Male', 45),

(17, 'Isla Turner', 'Female', 36),

(18, 'Oliver Brooks', 'Male', 39),

(19, 'Chloe Ross', 'Female', 33),

(20, 'Noah Baker', 'Male', 47);

Appointments
INSERT INTO Appointments (appointment_id, patient_id, doctor_id, appointment_date, status, fee)
VALUES

(3001, 11, 301, '2024-12-10', 'Completed', 400),

(3002, 12, 302, '2024-12-11', 'Scheduled', 250),

(3003, 13, 303, '2024-12-12', 'Completed', 300),

(3004, 14, 304, '2024-12-13', 'Canceled', NULL),

(3005, 15, NULL, '2024-12-14', 'Scheduled', 200),

(3006, 16, 305, '2024-12-15', 'Completed', 350),

(3007, 17, 306, '2024-12-16', 'Scheduled', 300),

(3008, NULL, 307, '2024-12-17', 'Canceled', NULL),

(3009, 19, NULL, '2024-12-18', 'Completed', 450),

(3010, 20, 308, '2024-12-19', 'Scheduled', 500);

Doctors Table

INSERT INTO Doctors (doctor_id, doctor_name, department_id, experience_years) VALUES

(301, 'Dr. Fiona Black', 9, 8),

(302, 'Dr. George White', 10, 12),

(303, 'Dr. Hannah Grey', 9, 5),

(304, 'Dr. Ian Scott', 11, 7),

(305, 'Dr. Jane Cooper', 10, 15),

(306, 'Dr. Kevin Reed', 12, 4),

(307, 'Dr. Laura Brown', 11, 9),

(308, 'Dr. Michael Adams', 13, 10),

(309, 'Dr. Olivia Clark', 12, 6),

(310, 'Dr. Peter Hall', 13, 14);

Departments Table

INSERT INTO Departments (department_id, department_name) VALUES

(9, 'Oncology'),

(10, 'Cardiothoracic Surgery'),


(11, 'Pediatrics'),

(12, 'Nephrology'),

(13, 'Gastroenterology');

Staff Table:

CREATE TABLE Staff (

staff_id INT PRIMARY KEY,

staff_name VARCHAR(100) NOT NULL,

role VARCHAR(50),

manager_id INT NULL,

FOREIGN KEY (manager_id) REFERENCES Staff(staff_id)

);

INSERT INTO Staff (staff_id, staff_name, role, manager_id) VALUES

(10, 'Mary Johnson', 'Director', NULL),

(11, 'James Carter', 'Admin Manager', 10),

(12, 'Sophia Brown', 'Receptionist', 11),

(13, 'Ethan Hill', 'Technician', 11),

(14, 'Oliver Gray', 'Nurse', 10),

(15, 'Isabella Evans', 'Pharmacist', 10),

(16, 'Lucas Martin', 'Admin Assistant', 11),

(17, 'Emma Lewis', 'Receptionist', 12),

(18, 'Liam Walker', 'Technician Assistant', 13),

(19, 'Charlotte Scott', 'Nurse Assistant', 14);

Basic Subqueries
Definition:

A subquery is a query inside another query. Subqueries help break down complex logic into smaller,
manageable pieces.

Types:
● Single-Row Subquery: Returns only one value.
● Multiple-Row Subquery: Returns multiple values.

1. Single-Row Subquery

Scenario 1 (Simple):

Find the department of the doctor with the highest experience.

Query:

SELECT department_name
FROM Departments
WHERE department_id = (
SELECT department_id
FROM Doctors
WHERE experience_years = (SELECT MAX(experience_years) FROM Doctors)
);

Explanation:

1. The innermost subquery calculates the maximum experience (MAX(experience_years)).


2. The middle subquery finds the department ID of the doctor with this experience.
3. The outer query fetches the department name.

Scenario 2 (Medium):

Find the patient name who has paid the highest appointment fee.

Query:

SELECT patient_name
FROM Patients
WHERE patient_id = (
SELECT patient_id
FROM Appointments
WHERE fee = (SELECT MAX(fee) FROM Appointments)
);
Explanation:

● The innermost subquery finds the maximum fee.


● The middle subquery fetches the patient ID associated with this fee.
● The outer query retrieves the patient name.

2. Multiple-Row Subqueries

Scenario 1 (Simple):

List the names of patients who have scheduled appointments with doctors in the "Oncology"
department.

Query:

SELECT patient_name
FROM Patients
WHERE patient_id IN (
SELECT patient_id
FROM Appointments
WHERE doctor_id IN (
SELECT doctor_id
FROM Doctors
WHERE department_id = (SELECT department_id FROM Departments WHERE
department_name = 'Oncology')
)
);

Explanation:

1. The inner subquery retrieves doctors from the "Oncology" department.


2. The middle subquery fetches patients who have appointments with those doctors.
3. The outer query fetches patient names.

Scenario 2 (Medium):

Find all departments that have scheduled at least two appointments.


Query:

SELECT department_name
FROM Departments
WHERE department_id IN (
SELECT department_id
FROM Doctors
WHERE doctor_id IN (
SELECT doctor_id
FROM Appointments
WHERE status = 'Scheduled'
GROUP BY doctor_id
HAVING COUNT(*) >= 2
)
);

Explanation:

● The innermost subquery groups appointments by doctor and checks for at least two
scheduled appointments.
● The second subquery identifies departments of these doctors.
● The outer query lists department names.

Correlated Subqueries
Definition:

In a correlated subquery, the inner query refers to columns from the outer query. This subquery runs
once for each row of the outer query.

Scenario 1 (Simple):

Find doctors who have handled more than one appointment.

Query:

SELECT doctor_name
FROM Doctors D
WHERE EXISTS (
SELECT 1
FROM Appointments A
WHERE A.doctor_id = D.doctor_id
GROUP BY A.doctor_id
HAVING COUNT(*) > 1
);

Explanation:

1. The subquery groups appointments by doctor ID and checks for those with more than one
appointment.
2. The EXISTS keyword filters only doctors matching this condition.

Scenario 2 (Medium):

Find patients whose appointment fee is higher than the average fee for their doctor.

Query:

SELECT patient_name
FROM Patients P
WHERE EXISTS (
SELECT 1
FROM Appointments A
WHERE A.patient_id = P.patient_id
AND A.fee > (
SELECT AVG(fee)
FROM Appointments
WHERE doctor_id = A.doctor_id
)
);

Explanation:
● The subquery calculates the average fee for each doctor.
● The outer query checks if a patient's fee exceeds their doctor's average fee.

Using EXISTS and NOT EXISTS


Definition:

● EXISTS: Checks if a subquery returns any rows.


● NOT EXISTS: Ensures that a subquery returns no rows.

Scenario 1 (Simple - EXISTS):

List departments with at least one doctor assigned.

Query:

SELECT department_name
FROM Departments D
WHERE EXISTS (
SELECT 1
FROM Doctors Doc
WHERE Doc.department_id = D.department_id
);

Explanation:

1. The subquery checks if a doctor exists for each department.


2. The EXISTS clause ensures only such departments are included.

Scenario 2 (Medium - NOT EXISTS):

Find patients who have not booked any appointments.


Query:

SELECT patient_name
FROM Patients P
WHERE NOT EXISTS (
SELECT 1
FROM Appointments A
WHERE A.patient_id = P.patient_id
);

Explanation:

● The subquery finds patients with appointments.


● The NOT EXISTS clause ensures that only patients without appointments are listed.

Common Table Expressions (CTEs)


Definition:

A Common Table Expression (CTE) is a temporary result set that simplifies queries and improves
readability. It can also be recursive for hierarchical data.

Scenario 1 (Simple - WITH Clause):

Find the total appointment fees collected by each department.

Query:

WITH DepartmentFees AS (
SELECT D.department_name, SUM(A.fee) AS TotalFees
FROM Appointments A
JOIN Doctors Doc ON A.doctor_id = Doc.doctor_id
JOIN Departments D ON Doc.department_id = D.department_id
WHERE A.status = 'Completed'
GROUP BY D.department_name
)
SELECT department_name, TotalFees
FROM DepartmentFees;

Explanation:

● The CTE calculates total fees for completed appointments by department.


● The main query fetches results from the CTE.

Scenario 2 (Medium - Recursive CTEs):

List the hierarchy of staff reporting under the "Director."

Query:

WITH StaffHierarchy AS (
SELECT staff_id, staff_name, role, manager_id
FROM Staff
WHERE manager_id IS NULL -- Top-level manager (Director)
UNION ALL
SELECT S.staff_id, S.staff_name, S.role, S.manager_id
FROM Staff S
INNER JOIN StaffHierarchy SH ON S.manager_id = SH.staff_id
)
SELECT *
FROM StaffHierarchy;

Explanation:

1. The recursive CTE starts with the top-level manager.


2. It iteratively finds staff reporting under each manager.
3. The result shows the full hierarchy.
5. Set Operations
Insert and Alter commands for set operations:

alter table departments add floor varchar(20)

alter table departments add head varchar(20)

Departments

INSERT INTO Departments (department_id, department_name, floor, head)

VALUES

(301, 'Cardiology', '2nd', 'Dr. Paul Turner'),

(302, 'Neurology', '3rd', 'Dr. Sarah Bennett'),

(303, 'Orthopedics', '1st', 'Dr. Kevin Hall'),

(304, 'Radiology', '2nd', 'Dr. Alice Morgan'),

(305, 'Dermatology', '1st', 'Dr. Lisa Adams'),

(306, 'Pediatrics', '4th', 'Dr. Emily Clark'),

(307, 'Urology', '3rd', 'Dr. John Miller'),

(308, 'Gastroenterology', '4th', 'Dr. Mark Foster'),

(309, 'Ophthalmology', '3rd', 'Dr. Rebecca Knight'),

(310, 'ENT', '1st', 'Dr. Sam Harris');

Doctors

INSERT INTO Doctors (doctor_id, first_name,last_name, experience_years, department_id)

VALUES

(211, 'Dr. Anna','Cooper', 12, 301),

(212, 'Dr. Matthew','Evans', 8, 302),

(213, 'Dr. Grace','Baker', 5, 303),

(214, 'Dr. Ryan','Carter', 15, 304),

(215, 'Dr. Sophia','Davis', 7, 305),

(216, 'Dr. Ethan','King', 3, 306),


(217, 'Dr. Ava','Taylor', 10, 307),

(218, 'Dr. Noah','Scott', 20, 308),

(219, 'Dr. Mia','Wright', 9, 309),

(220, 'Dr. Lucas','Brown', 6, 310);

Patients

INSERT INTO Patients (patient_id, first_name,last_name, age, gender)

VALUES

(1011, 'William','Johnson', 45, 'Male'),

(1012, 'Olivia','Martinez', 32, 'Female'),

(1013, 'James','Anderson', 50, 'Male'),

(1014, 'Isabella','Moore', 22, 'Female'),

(1015, 'Benjamin','Clark', 27, 'Male'),

(1016, 'Charlotte','Thomas', 60, 'Female'),

(1017, 'Henry','Lewis', 33, 'Male'),

(1018, 'Sophia','Harris', 40, 'Female'),

(1019, 'Alexander','Walker', 55, 'Male'),

(1020, 'Emma','Hall', 38, 'Female');

Appointments

INSERT INTO Appointments (appointment_id, patient_id, doctor_id, appointment_date, fee, status)

VALUES

(5111, 1011, 211, '2024-11-01', 300, 'Completed'),

(5112, 1012, 212, '2024-11-03', 200, 'Scheduled'),

(5113, 1013, 213, '2024-11-05', 250, 'Completed'),

(5114, 1014, 214, '2024-11-07', 150, 'Cancelled'),

(5115, 1015, 215, '2024-11-09', 350, 'Completed'),

(5116, 1016, 216, '2024-11-11', 400, 'Completed'),

(5117, 1017, 217, '2024-11-13', 450, 'Scheduled'),

(5118, 1018, 218, '2024-11-15', 500, 'Completed'),

(5119, 1019, 219, '2024-11-17', 300, 'Cancelled'),

(5120, 1020, 220, '2024-11-19', 250, 'Completed');


1. UNION and UNION ALL

Definition:

● UNION combines the results of two or more queries and removes duplicates.
● UNION ALL combines the results but keeps duplicates.

Business Scenario 1:

Find all unique names of hospital staff and doctors.

Query (Using UNION):

SELECT staff_name AS name FROM Staff


UNION
SELECT doctor_name AS name FROM Doctors;

Explanation:

● Combines staff names and doctor names into a single result set.
● Removes duplicate names, providing a list of unique names across both groups.

Business Scenario 2:

Find all names of hospital staff and doctors, including duplicates.

Query (Using UNION ALL):

SELECT staff_name AS name FROM Staff


UNION ALL
SELECT doctor_name AS name FROM Doctors;

Explanation:

● Similar to UNION, but it keeps duplicates.


● Useful for analyzing repeated names across both groups.

2. INTERSECT

Definition:

INTERSECT retrieves only the rows that are common between two queries.

Business Scenario:

Identify names that are common between the Staff and Doctors tables (e.g., a staff member who is
also a doctor).

Query:

SELECT staff_name AS name FROM Staff


INTERSECT
SELECT doctor_name AS name FROM Doctors;

Explanation:

● Finds overlapping names between staff and doctors.


● Useful for identifying shared roles, such as a doctor doubling as a part-time staff member.

3. EXCEPT

Definition:

EXCEPT retrieves rows from the first query that are not present in the second query.

Business Scenario 1:

Find names of doctors who are not listed as hospital staff.

Query:

SELECT doctor_name AS name FROM Doctors


EXCEPT
SELECT staff_name AS name FROM Staff;

Explanation:

● Returns doctor names that do not appear in the staff list.


● Useful for finding doctors who only consult but are not employed full-time.

Business Scenario 2:

Identify staff members who are not doctors.

Query:

SELECT staff_name AS name FROM Staff


EXCEPT
SELECT doctor_name AS name FROM Doctors;

Explanation:

● Returns staff names that are not found in the list of doctors.
● Helps in understanding the distinct roles within the hospital.

Combining Results from Multiple Queries

Definition:

This involves integrating or comparing data from different queries using UNION, INTERSECT, or
EXCEPT.

Business Scenario:

Generate a complete analysis of all names, including overlaps and exclusivity.

Step 1 (Unique Names):

SELECT staff_name AS name FROM Staff


UNION
SELECT doctor_name AS name FROM Doctors;
Step 2 (Common Names):

SELECT staff_name AS name FROM Staff


INTERSECT
SELECT doctor_name AS name FROM Doctors;

Step 3 (Exclusive to Doctors):

SELECT doctor_name AS name FROM Doctors


EXCEPT
SELECT staff_name AS name FROM Staff;

Explanation:

● Step 1 shows all unique names across staff and doctors.


● Step 2 identifies shared names.
● Step 3 isolates names exclusive to doctors, enabling a detailed breakdown.

Summary:

● UNION: Combines and removes duplicates (unique results).


● UNION ALL: Combines and retains duplicates.
● INTERSECT: Shows common elements between queries.
● EXCEPT: Shows elements in the first query that are not in the second.
CASE WHEN Statement

Definition:

The CASE WHEN statement applies conditional logic to query results, transforming or classifying data
based on specified criteria.

Basic Syntax for CASE WHEN Statement:


SELECT
column1,
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE result_default
END AS alias_name
FROM table_name;

1. CASE WHEN Statement: Examples

Basic Scenario:

Classify patients as "Minor", "Adult", or "Senior" based on their age.

Query:

SELECT
patient_id,
patient_name,
age,
CASE
WHEN age < 18 THEN 'Minor'
WHEN age BETWEEN 18 AND 60 THEN 'Adult'
ELSE 'Senior'
END AS age_group
FROM Patients;

Explanation:

1. CASE checks the age column.


2. Assigns "Minor" if age is less than 18, "Adult" for ages 18–60, and "Senior" otherwise.
3. Outputs a new column age_group alongside patient details.

Medium Scenario:

Determine whether appointments are "On Time", "Late", or "Missed" based on appointment status.

Query:

SELECT
appointment_id,
patient_id,
doctor_id,
status,
CASE
WHEN status = 'Completed' THEN 'On Time'
WHEN status = 'Cancelled' THEN 'Missed'
ELSE 'Late'
END AS appointment_status
FROM Appointments;

Explanation:

1. The CASE checks the status column of appointments.


2. Assigns "On Time" for Completed, "Missed" for Cancelled, and "Late" otherwise.
3. Adds a new column appointment_status showing the appointment’s classification.

CASE WHEN with GROUP BY

Definition:

Combining CASE WHEN with GROUP BY allows aggregation based on classifications.


Basic Syntax for CASE WHEN with GROUP BY:
SELECT
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE result_default
END AS alias_name,
aggregate_function(column_name)
FROM table_name
GROUP BY
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE result_default
END;

2. CASE WHEN with GROUP BY: Examples

Basic Scenario:

Count the number of patients in each age group.

Query:

SELECT
CASE
WHEN age < 18 THEN 'Minor'
WHEN age BETWEEN 18 AND 60 THEN 'Adult'
ELSE 'Senior'
END AS age_group,
COUNT(*) AS patient_count
FROM Patients
GROUP BY
CASE
WHEN age < 18 THEN 'Minor'
WHEN age BETWEEN 18 AND 60 THEN 'Adult'
ELSE 'Senior'
END;

Explanation:

1. Groups patients into "Minor", "Adult", and "Senior" based on age.


2. Aggregates the count (COUNT(*)) of patients in each group.
3. Produces a summary table of age group demographics.

Medium Scenario:

Calculate the total revenue collected for "Completed" and "Cancelled" appointments.

Query:

SELECT
CASE
WHEN status = 'Completed' THEN 'Completed'
ELSE 'Cancelled'
END AS appointment_status,
SUM(fee) AS total_revenue
FROM Appointments
GROUP BY
CASE
WHEN status = 'Completed' THEN 'Completed'
ELSE 'Cancelled'
END;

Explanation:

1. Groups appointments into Completed and Cancelled.


2. Calculates the total revenue (SUM(fee)) for each group.
3. Useful for analyzing the financial impact of completed and cancelled appointments.

Advanced Scenario:

Identify the doctor with the highest number of appointments for each appointment type.

Query:

SELECT
doctor_id,
CASE
WHEN status = 'Completed' THEN 'Completed'
WHEN status = 'Cancelled' THEN 'Cancelled'
ELSE 'Scheduled'
END AS appointment_status,
COUNT(*) AS total_appointments
FROM Appointments
GROUP BY
doctor_id,
CASE
WHEN status = 'Completed' THEN 'Completed'
WHEN status = 'Cancelled' THEN 'Cancelled'
ELSE 'Scheduled'
END
ORDER BY doctor_id, total_appointments DESC;

Explanation:

1. Groups appointments by doctor_id and status.


2. Counts the total appointments for each doctor and status group.
3. Helps track the workload or cancellation trends for each doctor.

Key Takeaways
Topic Syntax Use Case Difficulty Key Learning

CASE WHEN Single Classify data based Basic Applies logic to


condition on rules transform data
dynamically.

CASE WHEN with Aggregate Summarize data by Medium Enables grouped


GROUP BY with condition classified groups analysis for actionable
insights.

CASE WHEN with Multiple group Advanced grouping Advanced Combines grouping,
Aggregation conditions with aggregation logic, and
logic summarization.

Window Functions in SQL

Definition:

Window functions allow you to perform calculations across a "window" of rows related to the
current row without collapsing the result set. They are particularly useful for ranking, running totals,
comparisons, and advanced analytics.

Key Elements of Window Functions

1. OVER() Clause: Defines the scope or "window" of rows for the function.
2. PARTITION BY Clause: Divides the dataset into partitions (groups) for calculations.
3. ORDER BY Clause: Specifies the order of rows within each partition or window.
4. Types of Window Functions:
○ Ranking Functions: Assign a rank or position to rows (e.g., ROW_NUMBER,
RANK, DENSE_RANK).
○ Aggregate Functions as Window Functions: Perform calculations such as SUM,
AVG, or COUNT but maintain row-level data.
○ Analytic Functions: Provide advanced insights, such as LEAD, LAG,
FIRST_VALUE, LAST_VALUE, or NTILE.
5. Frame Clause: Specifies subsets of rows for each calculation using ROWS BETWEEN or
RANGE BETWEEN.

1. Introduction to Window Functions

Basic Syntax:
SELECT
column1,
window_function() OVER ([PARTITION BY column2] [ORDER BY column3]) AS
alias_name
FROM table_name;

Examples

Simple Query:

Find the total revenue for each doctor while keeping row-level appointment details.

Query:

SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id) AS total_revenue_by_doctor
FROM Appointments;
Output:

doctor_id appointment_id fee total_revenue_by_doctor

301 201 500 2600

301 203 700 2600

301 205 600 2600

301 208 800 2600

302 206 500 500

302 202 0 500

303 204 400 400

303 207 0 400

Explanation:

● The OVER(PARTITION BY doctor_id) clause groups rows by doctor_id.


● SUM(fee) calculates the total fee for each doctor, while maintaining row-level detail.

Medium Query:

Rank appointments for each doctor based on fee in descending order.

Query:

SELECT
doctor_id,
appointment_id,
fee,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_by_fee
FROM Appointments;
Output:

doctor_id appointment_id fee rank_by_fee

301 208 800 1

301 203 700 2

301 205 600 3

301 201 500 4

302 206 500 1

302 202 0 2

303 204 400 1

303 207 0 2

Explanation:

● PARTITION BY doctor_id groups appointments by doctor.


● ORDER BY fee DESC ranks appointments within each doctor group based on the fee in
descending order.
● RANK() assigns a rank, skipping numbers when there are ties.

2. Analytic Functions

Basic Syntax for Analytic Functions:


SELECT
column1,
analytic_function(column2) OVER ([PARTITION BY column3] [ORDER BY column4]) AS
alias_name
FROM table_name;

Simple Query:

Identify the previous appointment fee for each appointment of a doctor.


Query:

SELECT
doctor_id,
appointment_id,
fee,
LAG(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date) AS
previous_fee
FROM Appointments;

Output:

doctor_id appointment_id fee previous_fee

301 201 500 NULL

301 203 700 500

301 205 600 700

301 208 800 600

302 206 500 NULL

302 202 0 500

303 204 400 NULL

303 207 0 400

Explanation:

● LAG(fee) gets the fee from the previous appointment for the same doctor.
● PARTITION BY doctor_id ORDER BY appointment_date ensures that fees are ordered
chronologically within each doctor.

Medium Query:

Find the cumulative total of appointment fees for each doctor over time.
Query:

SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_fee
FROM Appointments;

Output:

doctor_id appointment_id fee cumulative_fee

301 201 500 500

301 203 700 1200

301 205 600 1800

301 208 800 2600

302 206 500 500

302 202 0 500

303 204 400 400

303 207 0 400

Explanation:

● SUM(fee) calculates a running total of fees.


● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW defines the range of rows
for the cumulative sum: from the first row to the current row.

3. Frame Clause

Simple Query:
Calculate a 3-day moving average of fees for each doctor.

Query:

SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date ROWS
BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_fee
FROM Appointments;

Output:

doctor_id appointment_id fee moving_avg_fee

301 201 500 500.0

301 203 700 600.0

301 205 600 600.0

301 208 800 700.0

302 206 500 500.0

302 202 0 250.0

303 204 400 400.0

303 207 0 200.0

Explanation:

● The window for AVG(fee) includes the current row and the two preceding rows.
● Provides a rolling average for trend analysis.

Key Takeaways
Feature Usage Difficulty Scenario
OVER() Clause Define window for calculations Basic Used in all window
functions.

PARTITION BY Group rows for window Basic Categorize data by groups.


Clause functions

Ranking Functions Assign ranks to rows Medium Track order or importance of


rows.

Analytic Functions Compare current row with Medium Lead/Lag, trends, and row
other rows comparisons.

Frame Clause Define row ranges for Advanced Moving averages or running
advanced calculations totals.

1. Ranking Functions

Example 1: ROW_NUMBER

Rank appointments for each doctor based on the appointment fee in descending order.

Query:

SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
row_number_rank
FROM Appointments;

Explanation:

● Assigns a unique rank (row_number_rank) to each appointment of a doctor based on the


fee, starting from the highest.
● PARTITION BY doctor_id ensures ranking is done separately for each doctor.

Example 2: RANK

Rank appointments for each doctor based on the appointment fee in descending order (with ties).
Query:

SELECT
doctor_id,
appointment_id,
fee,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties
FROM Appointments;

Explanation:

● Assigns a rank while leaving gaps for ties (e.g., if two rows tie at rank 1, the next rank is 3).
● Useful to compare values where ties matter.

2. Aggregate Functions as Window Functions

Example 1: SUM

Calculate the cumulative revenue (fees) collected by each doctor.

Query:

SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date) AS
cumulative_revenue
FROM Appointments;

Explanation:

● The cumulative revenue for each doctor is calculated by summing fees in chronological order
using ORDER BY appointment_date.
● PARTITION BY doctor_id ensures that the sum is calculated for each doctor
independently.

Example 2: AVG
Find the average fee collected by each doctor, shown alongside each appointment.

Query:

SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (PARTITION BY doctor_id) AS avg_fee_by_doctor
FROM Appointments;

Explanation:

● AVG(fee) calculates the average fee per doctor.


● Unlike a GROUP BY query, this result is displayed with each row.

3. Analytic Functions

Example 1: LEAD

Get the fee for the next appointment for each doctor.

Query:

SELECT
doctor_id,
appointment_id,
fee,
LEAD(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date) AS
next_appointment_fee
FROM Appointments;

Explanation:

● LEAD(fee) retrieves the fee of the next appointment in chronological order for the same
doctor.
● Helps to compare fees across consecutive appointments.
Example 2: FIRST_VALUE

Find the fee of the first appointment for each doctor.

SELECT
doctor_id,
appointment_id,
fee,
FIRST_VALUE(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date) AS
first_appointment_fee
FROM Appointments;

Explanation:

● FIRST_VALUE(fee) picks the fee from the earliest appointment for each doctor, based on
appointment_date.
● Useful for determining baseline or initial data values.

Key Comparison of Functions


Function Type Scenario Purpose

Ranking Functions Ranking rows (with or without Useful for ordered comparisons or
ties) competitions.

Aggregate as Window Cumulative or rolling totals, Retains row-level detail while


Functions averages, or counts aggregating.

Analytic Functions Comparing rows (e.g., Insight into relationships between


next/previous, first/last) rows.

These examples show how to use window functions to analyze data without collapsing it, offering
flexibility in handling real-world business scenarios.

4o

You said:
two examples for each function listed below based on above dataset ROW_NUMBER, RANK,
DENSE_RANK with explanation of code

ChatGPT said:
ChatGPT
Ranking Functions: Examples for ROW_NUMBER, RANK, DENSE_RANK

1. ROW_NUMBER: Assign Unique Ranks

Scenario 1: Rank appointments for each doctor based on the appointment fee in descending order,
assigning unique ranks even for identical fees.

Query:

SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
unique_rank
FROM Appointments;

Explanation:

● ROW_NUMBER assigns a unique rank to each row within a group defined by PARTITION
BY doctor_id.
● ORDER BY fee DESC ranks appointments by fee in descending order.
● No ties are allowed; each row gets a distinct rank, regardless of identical fees.

Output:

doctor_id appointment_id fee unique_rank

301 208 800 1

301 203 700 2

301 205 600 3

301 201 500 4

302 206 500 1

302 202 0 2
2. RANK: Assign Ranks with Ties

Scenario 2: Rank appointments for each doctor based on fee in descending order, allowing tied
ranks.

Query:

SELECT
doctor_id,
appointment_id,
fee,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties
FROM Appointments;

Explanation:

● RANK assigns ranks to rows within each group defined by PARTITION BY doctor_id.
● Tied rows (rows with identical fees) receive the same rank.
● The rank sequence skips numbers after ties (e.g., if two rows tie at rank 1, the next rank will
be 3).

Output:

doctor_id appointment_id fee rank_with_ties

301 208 800 1

301 203 700 2

301 205 600 3

301 201 500 4

302 206 500 1

302 202 0 2

3. DENSE_RANK: Assign Ranks Without Gaps

Scenario 3: Rank appointments for each doctor based on fee in descending order, ensuring ranks are
consecutive even with ties.
Query:

SELECT
doctor_id,
appointment_id,
fee,
DENSE_RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
dense_rank
FROM Appointments;

Explanation:

● DENSE_RANK works like RANK, but it does not skip numbers after ties.
● Tied rows receive the same rank, and the next rank is incremented consecutively.

Output:

doctor_id appointment_id fee dense_rank

301 208 800 1

301 203 700 2

301 205 600 3

301 201 500 4

302 206 500 1

302 202 0 2

Comparison of ROW_NUMBER, RANK, and DENSE_RANK


Function Behavior Use Case

ROW_NU Assigns a unique rank to each row. Ties are When a strict sequence is
MBER broken arbitrarily, and no gaps occur. required.

RANK Allows tied rows to have the same rank but When ties are important, but
skips numbers after ties. skipping ranks is acceptable.
DENSE_R Allows tied rows to have the same rank, and the When ties are important, and no
ANK next rank is assigned consecutively. gaps are desired.

Example of a Combined Query

To compare all three functions side-by-side:

Query:

SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
row_number_rank,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties,
DENSE_RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
dense_rank
FROM Appointments;

Output:

doctor_id appointment_id fee row_number_rank rank_with_ties dense_rank

301 208 800 1 1 1

301 203 700 2 2 2

301 205 600 3 3 3

301 201 500 4 4 4

302 206 500 1 1 1

302 202 0 2 2 2

This table helps clearly see the differences between the ranking functions!

4o
You said:
along with above code , I need clear and easy definition of functions

ChatGPT said:
ChatGPT

Ranking Functions: Clear Definitions

1. ROW_NUMBER: Assigns Unique Ranks

Definition:
The ROW_NUMBER function assigns a unique sequential number to each row within a partition
(group of rows). Even if multiple rows have the same values in the column(s) being ordered,
ROW_NUMBER will not allow ties and will assign a unique number to each row.

Use Case:
When you need a strict sequence for rows, regardless of whether they are identical in ranking
criteria.

2. RANK: Allows Ties and Skips Ranks

Definition:
The RANK function assigns a rank to rows within a partition. Rows with identical values (ties) are
assigned the same rank. After a tie, the ranking will skip numbers (gaps in ranks).

Use Case:
When you want to identify ties and maintain rank gaps for analysis (e.g., competitions or fee
rankings).

3. DENSE_RANK: Allows Ties Without Skipping Ranks

Definition:
The DENSE_RANK function works like RANK, but it does not skip numbers after ties. The next
rank is always consecutive, regardless of ties.

Use Case:
When you want tied rows to share the same rank and require consecutive ranks without gaps.

Code Examples and Explanations

1. ROW_NUMBER: Assign Unique Ranks

Scenario: Rank appointments for each doctor by fee in descending order, with no ties allowed.
Query:

SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
unique_rank
FROM Appointments;

Explanation:

● ROW_NUMBER(): Assigns a unique rank starting from 1 for each row in the group.
● PARTITION BY doctor_id: Each doctor's appointments are ranked independently.
● ORDER BY fee DESC: Fees are ranked from highest to lowest.
● Even if two fees are the same, they are assigned different ranks.

2. RANK: Allow Ties and Skip Ranks

Scenario: Rank appointments for each doctor by fee, allowing ties.

Query:

SELECT
doctor_id,
appointment_id,
fee,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties
FROM Appointments;

Explanation:

● RANK(): Assigns ranks starting from 1, allowing tied rows to share the same rank.
● Tied rows: If two rows have the same fee, they will share the same rank.
● Skipping ranks: If two rows tie at rank 1, the next row will have rank 3.

3. DENSE_RANK: Allow Ties Without Skipping Ranks


Scenario: Rank appointments for each doctor by fee, ensuring consecutive ranks.

Query:

SELECT
doctor_id,
appointment_id,
fee,
DENSE_RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
dense_rank
FROM Appointments;

Explanation:

● DENSE_RANK(): Similar to RANK(), but the next rank after a tie is consecutive (no
skipped ranks).
● Useful when analyzing ranks without any gaps, such as performance levels.

Key Differences and Comparison


Function Definition Tie Behavior Rank Sequence

ROW_NU Assigns a unique rank to each row, No ties; unique for Always sequential.
MBER no ties allowed. each row.

RANK Assigns rank with ties; skips ranks Tied rows share the Gaps exist after
after ties. same rank. ties.

DENSE_ Assigns rank with ties; no gaps in Tied rows share the Consecutive, no
RANK rank sequence. same rank. gaps.

Combined Example

Scenario: Compare all three ranking functions to analyze doctor appointment fees.

Query:

SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
row_number_rank,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties,
DENSE_RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
dense_rank
FROM Appointments;

Output:

doctor_id appointment_id fee row_number_rank rank_with_ties dense_rank

301 208 800 1 1 1

301 203 700 2 2 2

301 205 600 3 3 3

301 201 500 4 4 4

302 206 500 1 1 1

302 202 0 2 2 2

This example clearly illustrates the unique behaviors of ROW_NUMBER, RANK, and
DENSE_RANK.

4o

You said:
two examples for each window function listed below based on above dataset ,along with code , I
need clear and easy definition of functions SUM, AVG, or COUNT but maintain row-level data.

ChatGPT said:
ChatGPT

Window Functions: SUM, AVG, COUNT

Window functions allow us to perform calculations across rows that are related to the current row,
without collapsing the data into aggregated results. This is particularly useful when you need to
maintain row-level data while adding insights.
1. SUM as a Window Function

Definition:

The SUM window function calculates a running total or group total for each row in the dataset,
maintaining the original row structure.

Example 1: Calculate Total Fees by Doctor

Scenario: You want to know the total fees collected by each doctor for their appointments while
keeping row-level data.

Query:

SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id) AS total_fees_per_doctor
FROM Appointments;

Explanation:

● SUM(fee): Calculates the sum of fees for each doctor.


● OVER (PARTITION BY doctor_id): Groups rows by doctor_id so that the total fees are
calculated separately for each doctor.
● The original rows remain intact, but an additional column with the total fees is added.

Output:

doctor_id appointment_id fee total_fees_per_doctor

301 208 800 2600

301 203 700 2600

301 205 600 2600

301 201 500 2600

302 206 500 500


302 202 0 500

Example 2: Calculate Running Total of Fees

Scenario: You want to calculate the cumulative fees collected by each doctor across their
appointments in descending fee order.

Query:

SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
running_total_fees
FROM Appointments;

Explanation:

● ORDER BY fee DESC: Calculates the running total in descending order of fees.
● Each row contains the cumulative total up to that row within the partition (doctor_id).

Output:

doctor_id appointment_id fee running_total_fees

301 208 800 800

301 203 700 1500

301 205 600 2100

301 201 500 2600

302 206 500 500

302 202 0 500

2. AVG as a Window Function


Definition:

The AVG window function calculates the average of a column across rows in a group while keeping
the individual rows intact.

Example 1: Calculate Average Fee per Doctor

Scenario: You want to find the average fee collected by each doctor for their appointments.

Query:

SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (PARTITION BY doctor_id) AS avg_fee_per_doctor
FROM Appointments;

Explanation:

● AVG(fee): Calculates the average fee within each doctor group.


● PARTITION BY doctor_id: Ensures averages are calculated separately for each doctor.

Output:

doctor_id appointment_id fee avg_fee_per_doctor

301 208 800 650

301 203 700 650

301 205 600 650

301 201 500 650

302 206 500 250

302 202 0 250

Example 2: Calculate Running Average of Fees


Scenario: You want to calculate the cumulative average of fees for each doctor in descending fee
order.

Query:

SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS running_avg_fee
FROM Appointments;

Explanation:

● ORDER BY fee DESC: Calculates the running average in descending order of fees.
● Each row shows the average of all fees up to that row within the partition (doctor_id).

Output:

doctor_id appointment_id fee running_avg_fee

301 208 800 800

301 203 700 750

301 205 600 700

301 201 500 650

302 206 500 500

302 202 0 250

3. COUNT as a Window Function

Definition:

The COUNT window function calculates the number of rows in a group or partition while
maintaining row-level data.
Example 1: Count Total Appointments Per Doctor

Scenario: You want to know how many appointments each doctor has.

Query:

SELECT
doctor_id,
appointment_id,
fee,
COUNT(appointment_id) OVER (PARTITION BY doctor_id) AS total_appointments
FROM Appointments;

Explanation:

● COUNT(appointment_id): Counts the number of appointments for each doctor.


● PARTITION BY doctor_id: Ensures the count is calculated separately for each doctor.

Output:

doctor_id appointment_id fee total_appointments

301 208 800 4

301 203 700 4

301 205 600 4

301 201 500 4

302 206 500 2

302 202 0 2

Example 2: Count Running Appointments

Scenario: Count the cumulative number of appointments for each doctor in descending fee order.

Query:

SELECT
doctor_id,
appointment_id,
fee,
COUNT(appointment_id) OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
running_count
FROM Appointments;

Explanation:

● ORDER BY fee DESC: Calculates a running count based on descending fee order.
● Each row contains the count of appointments up to that row.

Output:

doctor_id appointment_id fee running_count

301 208 800 1

301 203 700 2

301 205 600 3

301 201 500 4

302 206 500 1

302 202 0 2

Summary of Window Function Usage


Function Definition Example Use Case

SUM Calculates a cumulative or grouped total Total fees collected per doctor or
while maintaining row-level data. running total fees.

AVG Calculates a cumulative or grouped average Average fees per doctor or running
while maintaining row-level data. average fees.

COUNT Calculates the count of rows while Total appointments per doctor or
maintaining row-level data. running count of appointments.

4o
You said:
two examples for each window function listed below based on above dataset ,along with code , I
need clear and easy definition of functions Analytic Functions: Provide advanced insights, such as
LEAD, LAG, FIRST_VALUE, LAST_VALUE, or NTILE

ChatGPT said:
ChatGPT

Window Functions: Analytic Functions

Analytic functions provide advanced insights by analyzing and comparing data within partitions or
rows. These include LEAD, LAG, FIRST_VALUE, LAST_VALUE, and NTILE. They help in
scenarios like trend analysis, identifying first or last records, or dividing data into equal parts.

1. LEAD Function

Definition:

The LEAD function retrieves the value of a column from the next row in the same partition. It's used
to compare current rows with future rows.

Example 1: Find the Next Appointment Fee

Scenario: You want to know the fee of the next appointment for each doctor.

Query:

sql

Copy code

SELECT
doctor_id,
appointment_id,
fee,
LEAD(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
next_appointment_fee
FROM Appointments;

Explanation:

● LEAD(fee): Fetches the fee of the next row within the same partition (doctor_id).
● ORDER BY appointment_id: Determines the order in which the next row is fetched.
Output:

doctor_id appointment_id fee next_appointment_fee

301 201 500 600

301 205 600 700

301 203 700 800

301 208 800 NULL

302 202 0 500

302 206 500 NULL

Example 2: Find the Next Patient

Scenario: You want to identify the patient scheduled next for each doctor.

Query:

SELECT
doctor_id,
appointment_id,
patient_id,
LEAD(patient_id) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
next_patient_id
FROM Appointments;

Output:

doctor_id appointment_id patient_id next_patient_id

301 201 101 102

301 205 102 103

301 203 103 104


301 208 104 NULL

302 202 105 106

302 206 106 NULL

2. LAG Function

Definition:

The LAG function retrieves the value of a column from the previous row in the same partition. It is
often used to compare current rows with past rows.

Example 1: Find the Previous Appointment Fee

Scenario: You want to know the fee of the previous appointment for each doctor.

Query:

SELECT
doctor_id,
appointment_id,
fee,
LAG(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
prev_appointment_fee
FROM Appointments;

Output:

doctor_id appointment_id fee prev_appointment_fee

301 201 500 NULL

301 205 600 500

301 203 700 600

301 208 800 700


302 202 0 NULL

302 206 500 0

Example 2: Find the Previous Patient

Scenario: You want to identify the patient scheduled before each current appointment for every
doctor.

Query:

SELECT
doctor_id,
appointment_id,
patient_id,
LAG(patient_id) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
prev_patient_id
FROM Appointments;

Output:

doctor_id appointment_id patient_id prev_patient_id

301 201 101 NULL

301 205 102 101

301 203 103 102

301 208 104 103

302 202 105 NULL

302 206 106 105

3. FIRST_VALUE Function

Definition:
The FIRST_VALUE function retrieves the first value of a column in the window or partition.

Example 1: Find the First Appointment Fee Per Doctor

Scenario: You want to know the fee of the first appointment for each doctor.

Query:

SELECT
doctor_id,
appointment_id,
fee,
FIRST_VALUE(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
first_appointment_fee
FROM Appointments;

Output:

doctor_id appointment_id fee first_appointment_fee

301 201 500 500

301 205 600 500

301 203 700 500

301 208 800 500

302 202 0 0

302 206 500 0

Example 2: Find the First Patient Per Doctor

Scenario: You want to identify the first patient scheduled for each doctor.

Query:

SELECT
doctor_id,
appointment_id,
patient_id,
FIRST_VALUE(patient_id) OVER (PARTITION BY doctor_id ORDER BY appointment_id)
AS first_patient_id
FROM Appointments;

Output:

doctor_id appointment_id patient_id first_patient_id

301 201 101 101

301 205 102 101

301 203 103 101

301 208 104 101

302 202 105 105

302 206 106 105

4. NTILE Function

Definition:

The NTILE function divides rows in a partition into a specified number of groups and assigns a rank
(bucket number) to each row.

Example 1: Divide Appointments into 2 Groups by Fee

Scenario: You want to divide appointments for each doctor into two equal groups based on their
fees.

Query:

SELECT
doctor_id,
appointment_id,
fee,
NTILE(2) OVER (PARTITION BY doctor_id ORDER BY fee) AS fee_group
FROM Appointments;

Output:

doctor_id appointment_id fee fee_group

301 201 500 1

301 205 600 1

301 203 700 2

301 208 800 2

302 202 0 1

302 206 500 2

Example 2: Divide Appointments into 3 Groups by Appointment ID

Scenario: You want to divide appointments into three groups based on their appointment_id.

Query:

SELECT
doctor_id,
appointment_id,
fee,
NTILE(3) OVER (ORDER BY appointment_id) AS appointment_group
FROM Appointments;

Output:

doctor_id appointment_id fee appointment_group

301 201 500 1


301 205 600 1

301 203 700 2

301 208 800 2

302 202 0 3

302 206 500 3

These examples highlight how analytic functions provide valuable insights by comparing and
analyzing data across rows.

4o

You said:
frame clause - range between and rows between give like above in SQL

ChatGPT said:
ChatGPT

Frame Clause: RANGE BETWEEN and ROWS BETWEEN

Definition:

The frame clause specifies a subset of rows relative to the current row within a window. It is used in
window functions to define which rows are included in the computation. The two primary clauses
are:

1. RANGE BETWEEN: Operates on the logical value of rows (e.g., based on ORDER BY
values).
2. ROWS BETWEEN: Operates on the physical row positions within the result set.

1. RANGE BETWEEN

Definition:

Defines a range of rows based on values in the ORDER BY clause. It is used for scenarios like
cumulative sums or averages over a specific value range.

Example 1: Cumulative Sum of Fees


Scenario: You want to calculate the cumulative sum of fees for each doctor up to the current
appointment.

Query:

SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (
PARTITION BY doctor_id
ORDER BY appointment_id
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_fee
FROM Appointments;

Explanation:

● RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Includes all


rows from the start of the partition up to the current row based on ORDER BY
appointment_id.

Output:

doctor_id appointment_id fee cumulative_fee

301 201 500 500

301 205 600 1100

301 203 700 1800

301 208 800 2600

302 202 0 0

302 206 500 500

Example 2: Average Fee within a Range


Scenario: You want to calculate the average fee within a range of appointments for each doctor.

Query:

SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (
PARTITION BY doctor_id
ORDER BY appointment_id
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS avg_fee_range
FROM Appointments;

Explanation:

● RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING: Includes


the current row and all subsequent rows.

Output:

doctor_id appointment_id fee avg_fee_range

301 201 500 650

301 205 600 700

301 203 700 750

301 208 800 800

302 202 0 250

302 206 500 500

2. ROWS BETWEEN

Definition:
Defines a frame based on the physical number of rows around the current row, regardless of their
values.

Example 1: Rolling Sum of Fees (Previous and Current Row)

Scenario: You want to calculate the rolling sum of fees for the current row and the previous row.

Query:

SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (
PARTITION BY doctor_id
ORDER BY appointment_id
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
) AS rolling_sum_fee
FROM Appointments;

Explanation:

● ROWS BETWEEN 1 PRECEDING AND CURRENT ROW: Includes the current row
and one row preceding it.

Output:

doctor_id appointment_id fee rolling_sum_fee

301 201 500 500

301 205 600 1100

301 203 700 1300

301 208 800 1500

302 202 0 0

302 206 500 500


Example 2: Rolling Average of Fees (2 Rows Following)

Scenario: You want to calculate the average fee for the current row and the next two rows.

Query:

SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (
PARTITION BY doctor_id
ORDER BY appointment_id
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) AS rolling_avg_fee
FROM Appointments;

Explanation:

● ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING: Includes the current row
and the next two rows.

Output:

doctor_id appointment_id fee rolling_avg_fee

301 201 500 600

301 205 600 700

301 203 700 750

301 208 800 800

302 202 0 250

302 206 500 500


Key Differences Between RANGE and ROWS:
Feature RANGE ROWS

Scope Based on logical values in ORDER BY. Based on physical rows.

Use Case Aggregates over value ranges. Aggregates over row positions.

Performance Slower for large datasets. Faster for row-based analysis.

You might also like