Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Pandas is an open-source Python library that provides Data manipulation and analysis tools. It includes data structures such as Dataframes and Series for handling structured data. The library also offers various functions, such as rank(), groupby(), count(), etc., for reshaping or manipulating your dataset.
The rank() function is used for calculating the ranking of data elements based on some of their column values. In this article, you will learn about the rank() function in Pandas with the help of some examples.
Let’s get started.
What is the rank() function in Pandas?
The rank() function assigns a rank to the elements in a Series or Dataframe. It computes the numerical rank of each element, usually based on the position of an element within the sorted dataset. It handles ties by taking the average(by default) of the ranks they would have received.
Parameters
The rank() function accepts the following parameters:-
method: This parameter controls how ties are handled. It can take the following values - average, min, max, first(rank based on the order of appearance), and dense (similar to min, but the rank is incremented by 1). We’ll take a look at each of them closely in the examples.
axis: This parameter specifies whether to rank elements along rows (0) or columns (1). By default, the value of this parameter is 0.
na_option: This parameter controls how missing values are treated. It can take the following values - keep(NaN values receive NaN ranks), top(ranked lower than non-NaN values), and bottom(ranked higher than non-NaN values).
ascending: This is a boolean parameter specifying whether the ranks should be assigned in ascending(true) or descending(false) order. The default value is true.
pct: If this parameter is true, the rankings are displayed in percentile form. By default, its value is false.
numeric_only: If this parameter is true, the function only ranks columns with numeric values. By default, this parameter is set to false.
Return Types
The return type is the same as the caller object. For example, if the caller object is a dataframe, the function returns a dataframe containing the data ranks as values.
In the next section, we will go through some examples of using the rank() function in Pandas.
Examples of the rank() Function in Pandas
Example 1
For the first example, we will create a dataframe containing two columns, product_name and quantity.
Python
Python
import pandas as pd
data = {'product_name': ['Apple', 'Banana', 'Orange', 'Grapes', 'Kiwi'],
'quantity': [10, 15, 8, None, 15]}
df = pd.DataFrame(data)
print(df)
You can also try this code with Online Python Compiler
Here, the result is a dataframe because we called the rank() function on an entire dataframe instead of a column, and as the axis parameter is 0 by default, the rank is generated along rows.
Example 3
In this example, we will set the pct parameter to true.
Python
Python
import pandas as pd
data = {'Product': ['Apple', 'Banana', 'Orange', 'Grapes', 'Pear'],
'Sales': [100, 150, 200, 150, 200]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print("\n")
df['Rank_Pct'] = df['Sales'].rank(pct=True)
print(df)
You can also try this code with Online Python Compiler
Here, the rankings are displayed in percentile form.
In the following examples, we will use the rank() function in combination with functions such as groupby() and sort_values().
Example 4
In this example, we will use the rank() function to calculate ranks within groups in our dataframe. First, we will create a dataframe with 3 columns - Category, Product, and Quantity and then we will use the groupby() function to perform grouping based on the Category column. After the grouping, we will use the rank() function on the Quantity column with default parameters.
Python
Python
import pandas as pd
data = {'Category': ['Fruit', 'Fruit', 'Vegetable', 'Fruit', 'Vegetable', 'Fruit'],
Here, row indices 0 and 4 are ranked 1 because they have the lowest quantity in their respective groups.
Example 5
This example has a dataframe with 3 columns - Name, Math, and Science. We will first calculate the ranks separately based on the Math and Science columns, and then we will find the mean rank for each student and sort the dataframe based on the mean rank.
Python
Python
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
Note that the ascending parameter was false, and the method parameter was min for breaking ties. The student with the highest scores gets a rank closer to 1.
Frequently Asked Questions
Does the rank() function in Pandas modify the original data?
The rank() function doesn't modify the original Dataframe or Series. It returns a new Dataframe or Series containing the calculated rank values based on the input data structure.
What is NaN in Python?
In Python, NaN stands for “Not a Number, " a special floating point value used to represent undefined or unrepresentable numerical values. In Pandas, NaN represents missing values that arise from concatenation operations performed on dataframes.
How can I use the rank() function to handle missing values?
The rank() function allows you to control the treatment of missing values using the na_option parameter, which can be set to - keep, top, or bottom.
Conclusion
In this article, you learned about the rank() function in Pandas with the help of various examples. We also used it in combination with other functions provided by Pandas, such as groupby() and mean().
Go through the following articles to learn more about Pandas:-