Table of contents
1.
Introduction
2.
What is the rank() function in Pandas?
2.1.
Parameters
2.2.
Return Types
3.
Examples of the rank() Function in Pandas
3.1.
Example 1
3.2.
Python
3.3.
Python
3.4.
Example 2
3.5.
Python
3.6.
Example 3
3.7.
Python
3.8.
Example 4
3.9.
Python
3.10.
Example 5
3.11.
Python
4.
Frequently Asked Questions
4.1.
Does the rank() function in Pandas modify the original data?
4.2.
What is NaN in Python?
4.3.
How can I use the rank() function to handle missing values?
5.
Conclusion
Last Updated: Mar 27, 2024
Easy

Pandas rank() Function

Author Abhinav Anand
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Pandas is an open-source Python library that provides Data manipulation and analysis tools. It includes data structures such as Dataframes and Series for handling structured data. The library also offers various functions, such as rank()groupby()count(), etc., for reshaping or manipulating your dataset.

pandas rank function

The rank() function is used for calculating the ranking of data elements based on some of their column values. In this article, you will learn about the rank() function in Pandas with the help of some examples.

Let’s get started.

What is the rank() function in Pandas?

The rank() function assigns a rank to the elements in a Series or Dataframe. It computes the numerical rank of each element, usually based on the position of an element within the sorted dataset. It handles ties by taking the average(by default) of the ranks they would have received.

Parameters

The rank() function accepts the following parameters:-

  1. method: This parameter controls how ties are handled. It can take the following values - average, min, max, first(rank based on the order of appearance), and dense (similar to min, but the rank is incremented by 1). We’ll take a look at each of them closely in the examples.
     
  2. axis: This parameter specifies whether to rank elements along rows (0) or columns (1). By default, the value of this parameter is 0.
     
  3. na_option: This parameter controls how missing values are treated. It can take the following values - keep(NaN values receive NaN ranks), top(ranked lower than non-NaN values), and bottom(ranked higher than non-NaN values).
     
  4. ascending: This is a boolean parameter specifying whether the ranks should be assigned in ascending(true) or descending(false) order. The default value is true.
     
  5. pct: If this parameter is true, the rankings are displayed in percentile form. By default, its value is false.
     
  6. numeric_only: If this parameter is true, the function only ranks columns with numeric values. By default, this parameter is set to false.

Return Types

The return type is the same as the caller object. For example, if the caller object is a dataframe, the function returns a dataframe containing the data ranks as values.

In the next section, we will go through some examples of using the rank() function in Pandas.

Examples of the rank() Function in Pandas

Example 1

For the first example, we will create a dataframe containing two columns, product_name and quantity.

  • Python

Python

import pandas as pd


data = {'product_name': ['Apple', 'Banana', 'Orange', 'Grapes', 'Kiwi'],

       'quantity': [10, 15, 8, None, 15]}


df = pd.DataFrame(data)


print(df)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Now, we will apply the rank function with varying parameters.

  • Python

Python

print(df)

# Applying rank() with default parameters

df['Rank_Default'] = df['quantity'].rank()


# Applying rank() with method='min' and ascending=False

df['Rank_MethodMin_Desc'] = df['quantity'].rank(method='min')


# Applying rank() with method='max'

df['Rank_MethodMax'] = df['quantity'].rank(method='max')


# Applying rank() with method='first'

df['Rank_MethodFirst'] = df['quantity'].rank(method='first')


# Applying rank() with method='dense'

df['Rank_MethodDense'] = df['quantity'].rank(method='dense')



# Applying rank() with na_option='top'

df['Rank_NA_Top'] = df['quantity'].rank(na_option='top')

df.head()
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here, we call the rank function on the quantity column and add the returned series as additional columns to the dataframe.

Example 2

We applied the rank() function to a single column in the previous example. Now let’s see how to use it on an entire dataframe.

  • Python

Python

import pandas as pd


data = {'product_name': ['Apple', 'Banana', 'Orange', 'Grapes'],

       'quantity': [10, 15, 8, 20],

       'price': [0.5, 0.25, 0.3, 0.6]}



df = pd.DataFrame(data)



ranked_df = df.rank(numeric_only=True)


print("Original DataFrame:")

print(df)


print("\nDataFrame with Rank Values:")

print(ranked_df)
You can also try this code with Online Python Compiler
Run Code


Output:

output

Here, the result is a dataframe because we called the rank() function on an entire dataframe instead of a column, and as the axis parameter is 0 by default, the rank is generated along rows.

Example 3

In this example, we will set the pct parameter to true.

  • Python

Python

import pandas as pd


data = {'Product': ['Apple', 'Banana', 'Orange', 'Grapes', 'Pear'],

       'Sales': [100, 150, 200, 150, 200]}


df = pd.DataFrame(data)

print("Original DataFrame:")

print(df)

print("\n")


df['Rank_Pct'] = df['Sales'].rank(pct=True)


print(df)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here, the rankings are displayed in percentile form.

In the following examples, we will use the rank() function in combination with functions such as groupby() and sort_values().

Example 4

In this example, we will use the rank() function to calculate ranks within groups in our dataframe. First, we will create a dataframe with 3 columns - Category, Product, and Quantity and then we will use the groupby() function to perform grouping based on the Category column. After the grouping, we will use the rank() function on the Quantity column with default parameters.

  • Python

Python

import pandas as pd


data = {'Category': ['Fruit', 'Fruit', 'Vegetable', 'Fruit', 'Vegetable', 'Fruit'],

       'Product': ['Apple', 'Banana', 'Carrot', 'Grapes', 'Cabbage', 'Orange'],

       'Quantity': [10, 15, 8, 20, 5, 12]}


df = pd.DataFrame(data)

print("Original DataFrame:")

print(df)

print("\n")


# Applying rank() within each group using groupby()

df['Rank_Within_Group'] = df.groupby('Category')['Quantity'].rank()



print(df)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here, row indices 0 and 4 are ranked 1 because they have the lowest quantity in their respective groups.

Example 5

This example has a dataframe with 3 columns - Name, Math, and Science. We will first calculate the ranks separately based on the Math and Science columns, and then we will find the mean rank for each student and sort the dataframe based on the mean rank.

  • Python

Python

import pandas as pd


data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],

       'Math': [85, 92, 78, 92],

       'Science': [70, 88, 95, 82]}


df = pd.DataFrame(data)



# Calculate ranks for 'Math' and 'Science'

df['Math_Rank'] = df['Math'].rank(ascending=False, method='min')

df['Science_Rank'] = df['Science'].rank(ascending=False, method='min')


# Calculate the mean rank for each student

df['Mean_Rank'] = df[['Math_Rank', 'Science_Rank']].mean(axis=1)


# Sort according to the mean rank

df_sorted = df.sort_values(by='Mean_Rank')


print("Original DataFrame:")

print(df)


print("\nDataFrame Sorted by Mean Rank:")

print(df_sorted)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Note that the ascending parameter was false, and the method parameter was min for breaking ties. The student with the highest scores gets a rank closer to 1.

Frequently Asked Questions

Does the rank() function in Pandas modify the original data?

The rank() function doesn't modify the original Dataframe or Series. It returns a new Dataframe or Series containing the calculated rank values based on the input data structure.

What is NaN in Python?

In Python, NaN stands for “Not a Number, " a special floating point value used to represent undefined or unrepresentable numerical values. In Pandas, NaN represents missing values that arise from concatenation operations performed on dataframes.

How can I use the rank() function to handle missing values?

The rank() function allows you to control the treatment of missing values using the na_option parameter, which can be set to - keep, top, or bottom.

Conclusion

In this article, you learned about the rank() function in Pandas with the help of various examples. We also used it in combination with other functions provided by Pandas, such as groupby() and mean().

Go through the following articles to learn more about Pandas:-

Happy Learning!

Live masterclass