The LAG() function in SQL is one of the most powerful and flexible tools available for performing advanced data analysis. It is often used to compare rows, calculate differences, and tracks trends in a dataset, especially for time-series data.
If we are working with sales, stock prices, or even employee performance metrics, the LAG() function can be a game changer. In this article, we will explain the function in detail and see how we can use it effectively in our SQL Queries.
What is the SQL LAG() Function?
The SQL LAG() function is a window function that allows us to retrieve the value of a column from a previous row in the result set. Unlike aggregate functions (such as SUM()
, AVG()
, etc.), the LAG() function does not collapse the result set. Instead, it returns values for each row based on a specific window or partition of the data. It gives us a powerful way to compare rows and analyze changes in values over time.
Syntax:
.LAG (scalar_expression [, offset [, default ]]) OVER ( [ partition_by_clause ] order_by_clause )
Key Terms
- scalar_expression - The value to be returned based on the specified offset.
- offset - The number of rows back from the current row from which to obtain a value. If not specified, the default is 1.
- default - default is the value to be returned if offset goes beyond the scope of the partition. If a default value is not specified, NULL is returned.
- partition_by_clause: An optional clause that divides the result set into partitions. The LAG() function is applied to each partition separately.
- order_by_clause: The order of the rows within each partition. This is mandatory and must be specified.
Why Use the LAG() Function?
The LAG() function is especially useful when we need to compare the current row's value with a previous row's value. This is essential in tasks like:
- Comparing Rows: Helps compare the current row with a previous row's data.
- Trend Analysis: Useful for analyzing changes in values, like stock prices, sales figures, or other time-series data.
- Finding Differences: Calculate the difference between consecutive rows in terms of time, quantity, or any other metric.
Example 1 : Basic Usage of LAG()
Let's look at some examples of SQL LAG function and understand how to use LAG Function in SQL. Suppose we want to track the revenue of a news organization over the years, comparing each year’s revenue to the previous year’s revenue.
Query:
SELECT Organisation, [Year], Revenue,
LAG (Revenue, 1, 0)
OVER (PARTITION BY Organisation ORDER BY [Year]) AS PrevYearRevenue
FROM Org
ORDER BY Organisation, [Year];
Output:
Organisation | Year | Revenue | PrevYearRevenue |
---|
ABCD News | 2013 | 440000 | 0 |
ABCD News | 2014 | 480000 | 440000 |
ABCD News | 2015 | 490000 | 480000 |
ABCD News | 2016 | 500000 | 490000 |
ABCD News | 2017 | 520000 | 500000 |
ABCD News | 2018 | 525000 | 520000 |
ABCD News | 2019 | 540000 | 525000 |
ABCD News | 2020 | 550000 | 540000 |
Z News | 2016 | 720000 | 0 |
Z News | 2017 | 750000 | 720000 |
Z News | 2018 | 780000 | 750000 |
Z News | 2019 | 880000 | 780000 |
Z News | 2020 | 910000 | 880000 |
Explantion:
- In the above example, We have 2 TV News Channel whose Current and Previous Year's Revenue is presented on the same row using the LAG() function.
- As You can see that the very first record for each of the TV News channels don't have previous year revenues so it shows the default value of 0.
- This function can be very useful in yielding data for BI reports when you want to compare values in consecutive periods, for e.g. Year on Year or Quarter on Quarter or Daily Comparisons.
Example 2 : Calculate Year-on-Year Growth
Now, let’s expand on the first example and calculate the year-on-year (YoY) growth for each organization. We'll subtract the PrevYearRevenue from the current Revenue to get the growth.
Query:
SELECT Z.*, (Z.Revenue - z.PrevYearRevenue) as YearonYearGrowth
FROM (SELECT Organisation, [Year], Revenue,
LAG (Revenue, 1)
OVER (PARTITION BY Organisation ORDER BY [Year] ) AS PrevYearRevenue
FROM Org) Z ORDER BY Organisation, [Year];
Output:
Organisation | Year | Revenue | PrevYearRevenue | YearOnYearGrowth |
---|
ABCD News | 2013 | 440000 | NULL | NULL |
ABCD News | 2014 | 480000 | 440000 | 40000 |
ABCD News | 2015 | 490000 | 480000 | 10000 |
ABCD News | 2016 | 500000 | 490000 | 10000 |
ABCD News | 2017 | 520000 | 500000 | 20000 |
ABCD News | 2018 | 525000 | 520000 | 5000 |
ABCD News | 2019 | 540000 | 525000 | 15000 |
ABCD News | 2020 | 550000 | 540000 | 10000 |
Z News | 2016 | 720000 | NULL | NULL |
Z News | 2017 | 750000 | 720000 | 30000 |
Z News | 2018 | 780000 | 750000 | 30000 |
Z News | 2019 | 880000 | 780000 | 100000 |
Z News | 2020 | 910000 | 880000 | 30000 |
Explanation:
- In the above example, We can similarly calculate Year On Year Growth for the TV News Channel.
- In this example is we haven't supplied any default parameter to LAG(), and hence the LAG() function returns NULL in case there are no previous values.
- The LAG() function can be implemented at the database level and BI Reporting solutions like Power BI and Tableau can avoid using the cumbersome measures at the reporting layer.
Use Cases of SQL LAG() Function
The LAG() function can be used in various practical scenarios across different industries:
- Sales Trends: Compare daily, monthly, or yearly sales performance and compare current figures with previous periods.
- Stock Analysis: Analyze changes in stock prices over time and identify trends or fluctuations.
- Employee Salaries: Analyze salary growth within a company by comparing current salaries with previous years.
- Data Validation: Identify missing or duplicate rows in a sequence.
Important Points About SQL LAG() Function
- Window Function: The LAG() function is categorized as a window function. It doesn’t collapse the result set but allows you to access data from previous rows.
- Order of Rows Matters: The ORDER BY clause is crucial because it defines the order in which the rows are processed. Without it, the function wouldn't know which row is the "previous" row.
- Null Values for First Row: For the first row in a partition, there’s no previous row, so LAG() returns
NULL
unless a default value is specified.
Conclusion
The LAG() function in SQL is a flexible and powerful tool for comparing rows, analyzing trends, and calculating differences in datasets. Its ability to access previous row data makes it ideal for tasks like year-over-year analysis, sales trends, and financial reporting. By mastering the LAG() function, we can perform advanced analytics directly within SQL and reduce the complexity of your reporting queries. Use the examples and tips in this article to implement the function effectively in your database operations.
Similar Reads
PL/SQL Functions
PL/SQL functions are reusable blocks of code that can be used to perform specific tasks. They are similar to procedures but must always return a value. A function in PL/SQL contains:Function Header: The function header includes the function name and an optional parameter list. It is the first part o
4 min read
PLSQL | LOG Function
The PLSQL LOG function is used for returning the logarithm of n base m. The LOG function accepts two parameters which are used to calculate the logarithmic value. The LOG function returns a value of the numeric data type. This function takes as an argument any numeric data type as well as any non-nu
2 min read
PLSQL | LPAD Function
The PLSQL LPAD function is used for padding the left-side of a string with a specific set of characters. a prerequisite for this is that string shouldn't be NULL. The LPAD function in PLSQL is useful for formatting the output of a query. The LPAD function accepts three parameters which are input_str
2 min read
PLSQL | LN Function
The LN function is an inbuilt function in PLSQL which is used to return the natural logarithm of a given input number. The natural logarithm of a number is the logarithm of that number to the base e, where e is the mathematical constant approximately equal to 2.718. This is written using the notatio
2 min read
PLSQL | LEAST Function
The LEAST is an inbuilt function in PLSQL which is used to return the least value from a given list of some expressions. These expressions may be numbers, alphabets etc. Syntax: LEAST(exp1, exp2, ... exp_n) Parameters Used: This function accept some parameters like exp1, exp2, ... exp_n. These each
2 min read
PostgreSQL - LAG Function
In PostgreSQL, the LAG() function is a powerful window function that allows you to access data from a previous row within the same result set. Itâs particularly useful for comparing values in the current row with values in the preceding row, making it ideal for analytical queries in PostgreSQL.For e
5 min read
SQL LTRIM() Function
The SQL LTRIM() function is an essential tool used in data cleaning and manipulation tasks. This function helps remove unwanted leading spaces or specific characters from the left side of a string or string expression. It's commonly used to tidy up data by eliminating unnecessary spaces or character
4 min read
PLSQL | LENGTH Function
The PLSQL LENGTH function is used for returning the length of the specified string, in other words, it returns the length of char. The char accepted by the LENGTH function in PLSQL can be of any of the datatypes such as CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB. The value returned by the LENG
1 min read
PL/SQL AVG() Function
The PL/SQL AVG() function serves as a powerful tool for performing aggregate calculations on numeric datasets within a database. By allowing developers to calculate average values while excluding NULL entries, it enhances data analysis capabilities.In this article, we will explore the AVG() function
5 min read
PL/SQL MAX() Function
The PL/SQL MAX() function is an essential aggregate function in Oracle databases, enabling users to efficiently determine the largest value in a dataset. Whether working with numerical data, dates, or strings, the MAX() function is flexible and widely applicable.In this article, we will provide a de
4 min read