Table of contents
1.
Introduction
2.
What is the Pandas Index?
3.
Default Index
4.
Creating a Custom Index
4.1.
Python
5.
How to use Pandas Index 
5.1.
Setting the Index
5.2.
Python
5.3.
Access the data
5.4.
Resetting the index
5.5.
Python
5.6.
Sorting the index 
5.7.
Python
6.
What is Pandas ReIndex?
6.1.
Syntax
6.2.
Parameters
6.3.
Example of ReIndex
6.4.
Python
6.5.
Why Reindex?
7.
Frequently Asked Questions
7.1.
What is the use case of the Pandas Index?
7.2.
How are missing values handled in Reindexing?
7.3.
How to change the default index in a Pandas DataFrame? 
8.
Conclusion
Last Updated: Mar 27, 2024
Easy

Pandas Index and Pandas Reindex

Author Dhruv Rawat
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Pandas data manipulation and analysis library in Python is widely used because it is open-sourced. Pandas offers a variety of powerful tools for handling and analysing structured data. 

Pandas Index and Pandas Reindex

Two important concepts are the Pandas Index and Pandas Reindex. They both play an important role in efficiently organising, accessing, and transforming the data.

Continue reading the article to learn about the Pandas Index and Pandas Reindex. 

What is the Pandas Index?

An index in Pandas is a data structure which is used to label the rows or entries of a DataFrame. A data frame refers to a data structure such as JSON, CSV, Web APIs, etc. 

The index acts as a reference for quick data retrieval, alignment, and manipulation. Each row or entry in a DataFrame is attached to a unique index value that helps to identify and access any specific data points.

Note, The Index is immutable that means it cannot be changed once created.

Default Index

By default, Pandas DataFrame is created with an integer index which starts from 0. This is the default index. Bur we can also create our own custom index by passing a list or array of labels to the index argument when creating the DataFrame.

To use the default integer index, we don't need to do anything special, it is set automatically by the pandas.

Creating a Custom Index

To create a custom index, pass a list or array of labels to the index argument while creating the DataFrame.

For example, the following code creates a DataFrame with a custom index of the names of the employees:

  • Python

Python

import pandas as pd

employee_data = {
   'name': ['John Doe', 'Jane Doe', 'Peter Smith'],
   'department': ['Sales', 'Marketing', 'Engineering'],
   'salary': [100000, 80000, 90000]
}
df = pd.DataFrame(employee_data, index=['John Doe', 'Jane Doe', 'Peter Smith'])
print(df)
You can also try this code with Online Python Compiler
Run Code


Output:

custom index output

The output of the code shows that the index is now the names of the employees. As we set the index to the list ['John Doe', 'Jane Doe', 'Peter Smith'].

How to use Pandas Index 

Let us discuss some examples on how to use the Pandas Index

Setting the Index

To set the index of a DataFrame to a different column or set of columns, we can use the set_index() method.

Below is the Syntax of the set_index() :

DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Below is a table describing the parameters and the return value of set_index():

Parameter

Description

Return Value

column

The name of the column to use as the index. 

Can also be a list of column names to create a multi-level index.

The modified DataFrame with the new index.

drop

Whether to drop the column(s) used to set the index. 
 

Defaults to True.

If True, the column(s) used to set the index are dropped from the DataFrame.

 If False, the column(s) are not dropped.

inplace

Whether to modify the DataFrame in place. 


Defaults to False.

If True, the DataFrame is modified in place. 

If False, a new DataFrame is created with the new index.

append

Whether to append the new index to the existing index. 

Defaults to False.

If True, the new index is appended to the existing index. 

If False, the new index replaces the existing index.
verify_integrity Whether to check the new index for duplicates. 


Defaults to False.
If True, the new index is checked for duplicates and an error is raised if any are found. 

If False, no check is performed.

Here is an example on how to use a custom index:

  • Python

Python

import pandas as pd

student_data = {

   'name': ['Dev', 'Kevesh', 'Smith'],

   'age': [20, 23, 21],

   'grade': [85, 90, 95],

   'location': ['New Delhi', 'Dehradun', 'Atlanta'],

}

df = pd.DataFrame(student_data)

# Set the index to the name column

df = df.set_index('name')

print(df)
You can also try this code with Online Python Compiler
Run Code


Output:

custom index output

In the above code, first we create a DataFrame from the student_data. Then, we use the set_index() method to set the index to the name column. Now, the name column is set to the row labels for the DataFrame.

Access the data

The output shows that the index is now the name column. So let us access the data in the DataFrame by using the name of the student as the row label.

For example, we can simply use the following code to get the grades for all students:

df['grade']


Output:

students grade output

However, to get the grade for kevesh, we can use the loc method. It is the method used to access data by label.

Below is the command where kevesh is the label for row and grade is the column that we want to access:

df.loc['Kevesh']['grade']


Output:

output for Kevesh grade

Below is the command to get full details of Kevesh:

df.loc['Kevesh']


Output:

output for Kevesh

Resetting the index

The reset_index() method is used to reset the index of a DataFrame. It helps in removing the custom index and replace it with an integer index starting from 0. Take a look at code below:

  • Python

Python

import pandas as pd

student_data = {
'name': ['Dev', 'Kevesh', 'Smith'],
'age': [20, 23, 21],
'grade': [85, 90, 95],
'location': ['New Delhi', 'Dehradun', 'Atlanta'],
}
df = pd.DataFrame(student_data)

# Set the index to the name column
df = df.set_index('name')

# Reset the index
df = df.reset_index()

print(df)
You can also try this code with Online Python Compiler
Run Code


Output:

resetting index output


In above code the reset_index() method, reset the index. It removes the name column from the index and replace it with an integer index starting from 0.

Sorting the index 

The sort_index() method is used to sort the index of the DataFrame. The sort_index() method can be used to sort the index in ascending or descending order.

See the following example on how to use sort_index():
 

  • Python

Python

import pandas as pd

student_data = {
'name': ['Dev', 'Kevesh', 'Smith'],
'age': [20, 23, 21],
'grade': [85, 90, 95],
'location': ['New Delhi', 'Dehradun', 'Atlanta'],
}

df = pd.DataFrame(student_data)
df = df.sort_index(ascending=False)

print(df)
You can also try this code with Online Python Compiler
Run Code


Output:

sorting index output

In the above code we pass the argument as ascending=False to the sort_index() method in order to sort the index in descending order.

What is Pandas ReIndex?

Reindexing in Pandas refers to the process of alteration of the index of a DataFrame. It helps to realign the data according to a new set of index values when we need to update or modify the index. Reindexing becomes very important in order to ensure that the data is correctly aligned for analysis. 

The Pandas ReIndex also allows one to specify a new index and align the existing data based on the new index. When reindexing a DataFrame, if any missing values are spotted then the pandas fill it with NaN (Not a Number). This is useful when we want to extend the data to cover missing time periods or index values.

Syntax

Below is the syntax for defining pandas reindex:

dataframe.reindex(keys, method, copy, level, fill_value, limit, tolerance)

Parameters

Below is a table describing the parameters 

Parameter

Value

Description

keys

 

Required. String or list containing row indexes or column labels

method

None,

'backfill',

'bfill',

'pad',

'ffill',

'nearest'

Optional. Fill missing values using data from the next available value (for increasing indexes).

copy

True,

False

Optional, default True. Return a new object (a copy) when all new indexes are the same as the old.

level

Number,

Label

Optional

fill_value

List of values

Optional. Specifies the value to use for missing values., Default is NaN

limit

Number

Optional. Limits the number of consecutive NaN values filled using the method. 

tolerance

 

Optional

Example of ReIndex

Now see the below example below to understand it better:

  • Python

Python

import pandas as pd

# Adding a new city to our data
new_data = {'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
'Temperature': [75, 82, 68, 90]}
new_df = pd.DataFrame(new_data)

# Setting 'City' as the index
new_df.set_index('City', inplace=True)

# Reindexing to include a new city
new_index = ['Los Angeles', 'Chicago', 'New Orleans']
reindexed_df = new_df.reindex(new_index)
print(reindexed_df)
You can also try this code with Online Python Compiler
Run Code


Output:

reindex output

In the above code, output New Orleans is included in the reindexed DataFrame. As it was not present in the original DataFrame, the temperature value is shown as NaN (missing data) because we knew that reindexing process aligns the data according to the new index values and fills in missing values with NaN by default.

Why Reindex?

Reindexing comes very handy when we have to deal with time series data, where we can have missing dates or irregular intervals. Reindexing helps to ensure that the data remains consistent. Also, the missing values can be easily filled with default values, i.e. NaN or any custom values.

Frequently Asked Questions

What is the use case of the Pandas Index?

Pandas Index provides a unique label for each row or entry, which helps in easy data retrieval and enhances data analysis speed.

How are missing values handled in Reindexing?

Missing values can be filled with default values (e.g., NaN) or custom values during the reindexing. This helps in improving the data accuracy during time of irregular data scenarios.

How to change the default index in a Pandas DataFrame? 

In order to change the default index, we can set it using the set_index(). This allows you to customise the index based on the characteristics of data or the needs of analysis.

Conclusion

Congratulations, you did a fantastic job!!. This article has gone through a comprehensive guide to the Pandas Index and Pandas Reindex and is explained with examples. At last, some frequently asked questions have been discussed.

Here are some more related articles:
 

Check out The Interview Guide for Product Based Companies and some famous Interview Problems from Top Companies, like AmazonAdobeGoogle, etc., on Coding Ninjas Studio.

Also, check out some of the Guided Paths on topics such as Data Structure and AlgorithmsCompetitive ProgrammingOperating SystemsComputer Networks, DBMSSystem Design, etc., as well as some Contests, Test SeriesInterview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.

We hope you liked this article.

"Have fun coding!”

Live masterclass