Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Last Updated: Mar 27, 2024

MySQL LEAD and LAG Function

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

You must now be familiar with MySQL, its features and its functionality. In this blog, we will be exploring a new component of MySQL, i.e. Window function. A window function is one that does not perform operations on the whole data. A partition or window is defined in these functions, and these functions perform operations on each row of that partition or window. 

Window functions are different from aggregate functions in a way that aggregate functions return a single row as an output, whereas the window functions return the result for each query row.

Some important terminologies related to Window functions are-

  • Current Row: The row on which the operation is being performed is known as the current row.
  • Window: The set of rows on which the operation will be performed is known as the window.

This blog will discuss two window functions: LEAD and LAG functions.

Let’s start with the LEAD function.

LEAD Function

A LEAD function is a Window function in MySQL that allows us to access the value of the succeeding/forwarding row from the current row. This function is beneficial if we want to compare and find the difference between the current row and the next rows. 

Source: Heavenly City Church

Below is the syntax to use the LEAD function-

LEAD (expression, offset, default_value) OVER (
           PARTITION BY (expression)
           ORDER BY (expression)
)

where,

expression can be any column of the table or built-in function.

offset tells the number of rows to be succeeded from the current row. This value must be a positive integer. If we specify 0, then the next row is considered the current row. If no offset is specified, then by default, it is taken as 1.

default_value contains the value to be returned if there is no subsequent row. By default, it returns the null value.

OVER is in charge of categorising rows into groups. If it is not present, the function operates on all rows.

PARTITION BY divides the result set's rows into partitions to which a function is applied. If this clause is not specified, all rows in the result set are treated as a single row.

ORDER BY tells the sequence of rows in the partitions before applying the function.

Now, let’s take an example to understand the LEAD function.

We will first create a table Sales with four columns - sales_id, customer_id, sales_date and sales_amount using the following command.

mysql> CREATE TABLE SALES (
        sales_id int primary key,
        customer_id int,
        sales_date date,
        sales_amount decimal(16, 2)
);

Then we will insert some records in this table using the following command.

mysql> INSERT INTO SALES VALUES
         (1, 1, '20200201', 500),
         (2, 1, '20200301', 7200),
         (3, 1, '20200401', 3440),
         (4, 2, '20200315', 29990),
         (5, 2, '20200921', 6700),
         (6, 3, '20201026', 4500),
         (7, 3, '20200611', 30000),
         (8, 4, '20201229', 8560);

We can check our inserted records using the following statement.

mysql> SELECT * FROM SALES;

The output of the above query will be-

Now suppose we want the details of the next sales of each customer, so for that purpose, we will use the LEAD function as follows.

mysql> SELECT customer_id, sales_date, sales_amount, LEAD (sales_amount) OVER ( 
PARTITION BY customer_id
ORDER BY sales_date ) next_sale 
FROM SALES;

Let’s understand the above query.

  • The above query will first divide the SALES table into partitions according to the customer_id column. Since we have four distinct values of customer_id, our table will be divided into four partitions (one for each customer_id).
  • Then each partition will be sorted in the increasing order of their sales_date column.
  • Finally, the LEAD function will be applied to the sales_amount column for each partition.

One point to note here is we haven't specified the offset parameter of the LEAD function (since we wanted the immediate subsequent value). So, by default, it will be taken as 1.

The output of the above query will be-

We have displayed our result in a new column named next_sale.

Now, you must be wondering why there are some NULL values in the next_sale column. So, to understand this, consider the customer_id 1. There are three sales for customer_id 1 with the sales_amount 500.00, 7200.00 and 3440.00. The next sale for the first sale amount, i.e. 500.00, will be 7200.00, and for the second sale amount, 7200.00 will be 3440.00. Since we have no fourth sale amount, so, for the third sale amount, the next sale will be NULL.

Similar is the case for the rest of the customer_id.

So, now, if we want to see the next to next sale details of each customer_id, in that case, we have to specify the offset as 2. Let’s run the LEAD function for this query.

mysql> SELECT customer_id, sales_date, sales_amount, LEAD (sales_amount,2) OVER ( 
PARTITION BY customer_id
ORDER BY sales_date ) next_to_next_sale 
FROM SALES;

The output of the above query will be-

Since we have the next to next sale record of only one customer_id ,i.e. 1, the rest values will be NULL.

Now, we will study the LAG function.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

LAG Function

A LAG function is a Window function in MySQL that allows us to access the value of the preceding/backward row from the current row. As the name suggests, this function is just the opposite of the LEAD function. This function is beneficial if we want to compare and find the difference between the current row and the previous rows. 

Source: Dreamstime.com

Below is the syntax to use the LAG function.

LAG (expression, offset, default_value) OVER (
           PARTITION BY (expression)
           ORDER BY (expression)
)

where,

expression can be any column of the table or built-in function.

offset tells the number of rows to be preceded from the current row. This value must be a positive integer. If we specify 0, then the preceding row is considered the current row. If no offset is specified, then by default, it is taken as 1.

default_value contains the value to be returned if there is no preceding row. By default, it returns the null value.

OVER is in charge of categorising rows into groups. If it is not present, the function operates on all rows.

PARTITION BY divides the result set's rows into partitions to which a function is applied. If this clause is not specified, all rows in the result set are treated as a single row.

ORDER BY tells the sequence of rows in the partitions before applying the function.

Now, let’s take an example to understand the LAG function.

We will consider the same SALES table which we created above.

Suppose we want the details of the previous sales of each customer, so for that purpose, we will use the LAG function as follows.

mysql> SELECT customer_id, sales_date, sales_amount, LAG (sales_amount) OVER ( 
PARTITION BY customer_id
ORDER BY sales_date ) last_sale 
FROM SALES;

Let’s understand the above query.

  • The above query will first divide the SALES table into partitions according to the customer_id column. Since we have four distinct values of customer_id, our table will be divided into four partitions (one for each customer_id).
  • Then each partition will be sorted in the increasing order of their sales_date column.
  • Finally, the LAG function will be applied to the sales_amount column for each division.

One point to note here is we haven't specified the offset parameter of the LAG function (since we want the immediately preceding value). So, by default, it will be taken as 1.

The output of the above query will be-

We have displayed our result in a new column named last_sale.

Now, you must be wondering why there are some NULL values in the last_sale column. The same concept of the LEAD function applies here. To understand this, consider the customer_id 1. There are three sales for customer_id 1 with the sales _amount 500.00, 7200.00 and 3440.00. The last sale for the third sale amount, i.e. 3440.00, will be 7200.00, and for the second sale amount, 7200.00 will be 500.00. Since we have no zeroth sale amount (not possible) or default value, so, for the first sale amount, the last sale will be NULL.

Similar is the case for the rest of the customer_id.

So, now, if we want to see the last to last sale details of each customer_id, in that case, we have to specify the offset as 2. Let’s run the LAG function for this query.

mysql> SELECT customer_id, sales_date, sales_amount, LAG (sales_amount,2) OVER ( 
PARTITION BY customer_id
ORDER BY sales_date ) last_to_last_sale
FROM SALES;

The output of the above query will be-

Since we have a last to last sale record of only one customer_id ,i.e. 1, the rest values will be NULL.

Refer to know about : Update Query in MySQL

FAQs

  1. Define window functions in MySQL.
    A window function does not perform operations on the whole data. A partition or window is defined in these functions, and these functions perform operations on each row of that partition or window. 
     
  2. How many window functions are there in MySQL?
    There are three types of window functions in MySQL-
    • Aggregate functions 
    • Ranking functions 
    • Analytical functions
       
  3. Define aggregate functions. 
    Aggregate functions perform operations on multiple rows and return the result in a single row. The following are some of the most important aggregate functions COUNT, SUM, AVG, MIN, MAX etc.
     
  4. Define ranking functions.
    Ranking functions allow us to rank each partition row in a table. The following are the most crucial ranking functions:  RANK, DENSE_RANK, PERCENT_RANK, ROW_NUMBER, CUME_DIST, etc.
     
  5. Define analytical functions.
    Analytical functions are represented locally by a power series. The critical analytical functions are NTILE, LEAD, LAG, NTH, FIRST_VALUE, LAST_VALUE, etc.

Key Takeaways

In this blog, we talked about the LEAD and LAG functions in MySQL. LEAD functions are used to see a succeeding row, and LAG functions are used to see a preceding row. We thoroughly discussed both functions with their syntaxes and examples.

Check out this problem - Smallest Distinct Window .

Recommended Readings:

Attention Ninja!!!!! Don’t stop here. Start your DBMS journey with the DBMS course on Coding Ninjas. To master SQL, solve our Top 100 SQL Problems asked in various company interviews.

Happy Learning!

Topics covered
1.
Introduction
2.
LEAD Function
3.
LAG Function
4.
FAQs
5.
Key Takeaways