Window Functions - Realtime Examples
Window Functions - Realtime Examples
GritSetGrow - GSGLearn.com
DATA AND AI
WINDOW FUNCTIONS
WITH
REAL-TIME EXAMPLES
& EXPLANATIONS
www.gsglearn.com
Introduction to Window Functions
Window functions perform calculations across a set of rows related to the current row,
while retaining the original granularity of the dataset. They are essential for:
Unlike aggregate functions (e.g., SUM , AVG ), window functions do not collapse rows
into summaries. Instead, they compute values for each row based on a defined window of
rows.
Sample Data
-- Create table
CREATE TABLE stock_data (
stock_symbol VARCHAR(10),
date DATE,
open_price DECIMAL(10,2),
high_price DECIMAL(10,2),
low_price DECIMAL(10,2),
closing_price DECIMAL(10,2),
volume INT,
dividend DECIMAL(10,2)
);
SELECT
stock_symbol,
date,
closing_price,
ROW_NUMBER() OVER (
PARTITION BY stock_symbol
ORDER BY date
) AS trading_day_sequence
FROM stock_data;
Output Explanation:
SELECT
date,
stock_symbol,
volume,
RANK() OVER (PARTITION BY date ORDER BY volume DESC) AS volume_rank,
DENSE_RANK() OVER (PARTITION BY date ORDER BY volume DESC) AS dense_volume_rank
FROM stock_data;
c. NTILE(n)
Divides rows into n buckets (e.g., quartiles, deciles).
Use Case: Segment stocks into quartiles based on closing price volatility.
SELECT
stock_symbol,
date,
closing_price,
NTILE(4) OVER (
PARTITION BY stock_symbol
ORDER BY (high_price - low_price) DESC
) AS volatility_quartile
FROM stock_data;
Insight:
Stocks in quartile 1 have the highest daily price swings (high volatility).
SELECT
stock_symbol,
date,
closing_price,
LAG(closing_price, 1) OVER (PARTITION BY stock_symbol ORDER BY date) AS
prev_close,
closing_price - LAG(closing_price, 1) OVER (PARTITION BY stock_symbol ORDER BY
date) AS dod_change,
LEAD(closing_price, 5) OVER (PARTITION BY stock_symbol ORDER BY date) AS
next_5day_close
FROM stock_data;
Output Analysis*:
Insight*:
Track weekly performance trends (e.g., +5% increase from Monday to Friday).
SELECT
stock_symbol,
date,
closing_price,
AVG(closing_price) OVER (
PARTITION BY stock_symbol
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7day
FROM stock_data;
Output:
b. Cumulative Sum
Use Case: Track cumulative dividends paid over time.
SELECT
stock_symbol,
date,
dividend,
SUM(dividend) OVER (
PARTITION BY stock_symbol
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_dividend
FROM stock_data;
Insight:
SELECT
stock_symbol,
date,
volume,
SUM(volume) OVER (
PARTITION BY stock_symbol
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS rolling_3day_volume
FROM stock_data;
Business Use:
Identify spikes in trading activity (e.g., rolling volume > 300,000 shares).
SELECT
stock_symbol,
date,
closing_price,
PERCENT_RANK() OVER (
PARTITION BY stock_symbol
ORDER BY closing_price
) AS price_percentile
FROM stock_data
WHERE stock_symbol = 'AAPL';
Interpretation:
A percentile of 0.9 means the price is higher than 90% of historical values.
b. CUME_DIST()
Calculates the cumulative distribution of a value (proportion of rows ≤ current
value).
Use Case: Analyze the distribution of GOOGL's daily trading volumes.
SELECT
stock_symbol,
date,
volume,
CUME_DIST() OVER (
PARTITION BY stock_symbol
ORDER BY volume
) AS volume_cume_dist
FROM stock_data
WHERE stock_symbol = 'GOOGL';
Output:
A CUME_DIST of 0.7 means 70% of days had volumes ≤ current day's volume.
SELECT
stock_symbol,
date,
closing_price,
AVG(closing_price) OVER (
PARTITION BY stock_symbol
ORDER BY date
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING
) AS avg_prior_3days
FROM stock_data;
SELECT
stock_symbol,
sector,
date,
volume,
RANK() OVER (
PARTITION BY sector, date
ORDER BY volume DESC
) AS sector_volume_rank
FROM stock_data;
SELECT
stock_symbol,
date,
dividend,
closing_price,
(dividend / closing_price) * 100 AS dividend_yield,
AVG(dividend / closing_price) OVER (PARTITION BY sector) * 100 AS sector_avg_yield
FROM stock_data;
6. Performance Optimization
Indexing: Add indexes on stock_symbol and date for faster partitioning.
Frame Size: Use narrow frames (e.g., ROWS 10 PRECEDING ) for large datasets.
Avoid RANGE : Prefer ROWS over RANGE for precise control and faster
execution.
7. Common Pitfalls
1. Omitting ORDER BY : Causes incorrect cumulative sums or rankings.
2. Over-Partitioning: Excessive partitions slow down queries.
3. Ambiguous Window Frames: Always define ROWS / RANGE explicitly.
8. Conclusion
Window functions empower financial analysts to:
By combining partitioning, ordering, and custom frames, they unlock granular insights
without aggregating data. For instance, a hedge fund might use RANK() to identify
outperforming stocks or LAG() to model mean-reversion strategies. Mastery of window
functions is critical for modern financial analytics.
Shwetank Singh
GritSetGrow - GSGLearn.com