Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Welcome to Coding Ninjas; Today, we are going to discuss a clause in SQL called “PARTITION BY”. In this article, we will discuss what PARTITION BY SQL is, its syntax, analytical functions and windowing, and its use cases. We will also discuss its limitations and alternatives.
Before moving on to the PARTITION BY, I will recommend you understand a clause in this article on ‘SQL Clauses and Its Types’.
What is the PARTITION BY Clause in SQL?
PARTITION BY is a clause in SQL that is used to divide the result into partitions or subsets to perform the computations. The computation is done on the basis of number of columns. PARTITION BY is used in conjunction with analytical functions.
PARTITION BY is a clause that can be used in operations where the analytical functions like. SUM(), AVG(), MIN(), MAX() and RANK() are involved. It is useful to use PARTITION BY, where the partition on the basis of some kind of ID may need.
This can be used in various SQL database systems such as SQL Server, Oracle, PostgreSQL, and SQLite.
Here is a basic syntax for using PARTITION BY SQL:
SELECT col1, col2, col3, ..., analytic_function() OVER (PARTITION BY partition_col1, partition_col2, partition_col3, ...)
FROM table_name
In the above syntax query, SELECT is used to get the given columns of the table (col1, col2, and col3) on which the analytical functions will be applied. Here are multiple columns (partition_col1, partition_col2, partition_col3) which show the data is to be partitioned by PARTITION BY clause.
Analytical Functions in SQL
There are several analytical functions in SQL that you can perform on a row that is returned by a query.
Here are some of the analytical functions that are commonly used
RANK(): It is an analytical function that is used to assign the rank to the rows in a specific column, and rank may or may not be unique means if two rows are the same, the same rank will be assigned to them.
ROW_NUMBER(): This function is used to assign a unique row number to each row in the table on the basis on which the rows are retrieved from the table.
SUM(): As the name suggests, SUM() is an aggregate function that can be used to calculate the total of a specific column passed to this function.
MIN() and MAX(): Both of these functions are used to find the minimum and maximum values from a specific column that is passed from the table, respectively.
FIRST_VALUE() and LAST_VALUE(): These functions are used to get the first and last values from the partition, respectively.
NTH_VALUE(): This function is used to get a specific value of a row in a partition of the table.
More Examples
Calculating Average
Suppose we have a table ‘products_table’ with five products in it, and every product has its product_id’, ‘product_name’, and ‘product_price’, as you can see below table:
product_id
product_name
product_price
1
A
10
1
B
15
2
C
20
3
D
25
3
E
30
Now, we want to find the average price of the product with the same unique ‘product_id’. For this, we will use the analytical function AVG, which will help us to find the average price, and ‘product_id’ can be used to partition the table using PARTITION BY.
Here is the SQL command to perform this operation:
SELECT product_id, product_name, product_price,
AVG(product_price) OVER (PARTITION BY product_id) AS avg_price_by_id
FROM products_table;
The result table will be printed on the SQL Client or Database Management Tool that you’re using; the result table will be similar to this:
product_id
product_name
product_price
avg_price_by_id
1
A
10
12.5
1
B
15
12.5
2
C
20
20
3
D
25
27.5
3
E
30
27.5
Calculating Sum
In this example, we will be calculating the sum of the partition using the analytical function ‘SUM()’. Here is the table on which we will be calculating the sum on the basis of product_id.
product_id
product_name
product_price
1
A
50
1
A
75
2
B
100
3
C
25
2
B
50
1
A
100
Here, we partition by product_id using PARTITION BY and calculate the sum using SUM() with the column product_price. Here is the query to perform this operation:
SELECT product_id, product_name, SUM(product_price) OVER (PARTITION BY product_id) as total_sum
FROM products;
Here is the output table after this query:
product_id
product_name
product_price
total_sum
1
A
50
225
1
A
75
225
2
B
100
150
3
C
25
25
2
B
50
150
1
A
100
225
Finding Maximum Price
In this example, we will be finding the maximum value of the partition using the analytical function ‘MAX()’. We will find the maximum price of the products in a partition, and the partition will be done by product_id. Here is the table for a better understanding:
product_id
product_name
product_price
1
A
10.99
1
B
5.99
2
C
1.99
2
D
3.49
3
E
15.99
3
F
7.99
Here, we partition by product_id using PARTITION BY and find the maximum using MAX() with the column product_price. Here is the query to perform this operation:
SELECT product_id, product_name, product_price, MAX(product_price) OVER (PARTITION BY product_id) AS max_price
FROM products.
Here is the output table after this query:
product_id
product_name
product_price
max_price
1
A
10
10
1
B
5
10
2
C
1
3
2
D
3
3
3
E
15
15
3
F
7
15
Ranking the Products
In this example, we will use an analytical function called ‘RANK()’ by which we can assign the rank to the rows in a specific column. Here is the table on which we are going to perform this operation:
product_id
product_name
product_price
1
A
10
1
B
20
2
C
30
2
D
40
3
E
50
3
F
60
3
G
70
Here we are performing partition by product_id using PARTITION BY and finding the rank of each row using their product_price for comparison. Here is the query to perform this operation:
SELECT product_id, product_name, product_price,
RANK() OVER (PARTITION BY product_id ORDER BY product_price DESC) AS rank
FROM products;
PARTITION BY is a clause in SQL that is used to divide the result into partitions or subsets to perform the computations. The computation is done on the basis of number of columns. PARTITION BY is used in conjunction with analytical functions.
What is the difference between partition by and group by?
In order to partition result sets and enable calculations within each partition, SQL window functions require the "PARTITION BY" parameter. By merging rows with identical values into summary rows, "GROUP BY" aggregates data. Typically, aggregate functions like COUNT or SUM are used.
How to create partition in SQL?
Use the "PARTITION BY" clause in a window function in SQL to create a partition. Give specifics on the partitioning criteria that will separate result sets into logical parts for computation or analysis, improving the capabilities for data manipulation and reporting.
How to count partition by in SQL?
Use the "COUNT" function with the "PARTITION BY" phrase in SQL to count within partitions. The result set is divided into groups according to the given criteria, and the occurrences within each group are then counted independently.
Conclusion
PARTITION BY SQL is used to divide a result into partitions for the computations. It has analytical functions such as RANK(), DENSE_RANK(), and ROW_RANK(). In this article, we discussed what this clause is, how it works, and its syntax and discussed this clause by taking different examples. We also discussed several analytical functions in SQL.
As we have discussed, PARTITION BY is one of the clauses; If you want to read about clauses, I will recommend you to read this article on ‘SQL Clauses and Its Types’.