Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Pandas data manipulation and analysis library in Python is widely used because it is open-sourced. Pandas offers a variety of powerful tools for handling and analysing structured data.
Two important concepts are the Pandas Index and Pandas Reindex. They both play an important role in efficiently organising, accessing, and transforming the data.
Continue reading the article to learn about the Pandas Index and Pandas Reindex.
What is the Pandas Index?
An index in Pandas is a data structure which is used to label the rows or entries of a DataFrame. A data frame refers to a data structure such as JSON, CSV, Web APIs, etc.
The index acts as a reference for quick data retrieval, alignment, and manipulation. Each row or entry in a DataFrame is attached to a unique index value that helps to identify and access any specific data points.
Note, The Index is immutable that means it cannot be changed once created.
Default Index
By default, Pandas DataFrame is created with an integer index which starts from 0. This is the default index. Bur we can also create our own custom index by passing a list or array of labels to the index argument when creating the DataFrame.
To use the default integer index, we don't need to do anything special, it is set automatically by the pandas.
Creating a Custom Index
To create a custom index, pass a list or array of labels to the index argument while creating the DataFrame.
For example, the following code creates a DataFrame with a custom index of the names of the employees:
The output of the code shows that the index is now the names of the employees. As we set the index to the list ['John Doe', 'Jane Doe', 'Peter Smith'].
How to use Pandas Index
Let us discuss some examples on how to use the Pandas Index
Setting the Index
To set the index of a DataFrame to a different column or set of columns, we can use the set_index() method.
In the above code, first we create a DataFrame from the student_data. Then, we use the set_index() method to set the index to the name column. Now, the name column is set to the row labels for the DataFrame.
Access the data
The output shows that the index is now the name column. So let us access the data in the DataFrame by using the name of the student as the row label.
For example, we can simply use the following code to get the grades for all students:
df['grade']
Output:
However, to get the grade for kevesh, we can use the loc method. It is the method used to access data by label.
Below is the command where kevesh is the label for row and grade is the column that we want to access:
df.loc['Kevesh']['grade']
Output:
Below is the command to get full details of Kevesh:
df.loc['Kevesh']
Output:
Resetting the index
The reset_index() method is used to reset the index of a DataFrame. It helps in removing the custom index and replace it with an integer index starting from 0. Take a look at code below:
In above code the reset_index() method, reset the index. It removes the name column from the index and replace it with an integer index starting from 0.
Sorting the index
The sort_index() method is used to sort the index of the DataFrame. The sort_index() method can be used to sort the index in ascending or descending order.
See the following example on how to use sort_index():
In the above code we pass the argument as ascending=False to the sort_index() method in order to sort the index in descending order.
What is Pandas ReIndex?
Reindexing in Pandas refers to the process of alteration of the index of a DataFrame. It helps to realign the data according to a new set of index values when we need to update or modify the index. Reindexing becomes very important in order to ensure that the data is correctly aligned for analysis.
The Pandas ReIndex also allows one to specify a new index and align the existing data based on the new index. When reindexing a DataFrame, if any missing values are spotted then the pandas fill it with NaN (Not a Number). This is useful when we want to extend the data to cover missing time periods or index values.
Required. String or list containing row indexes or column labels
method
None,
'backfill',
'bfill',
'pad',
'ffill',
'nearest'
Optional. Fill missing values using data from the next available value (for increasing indexes).
copy
True,
False
Optional, default True. Return a new object (a copy) when all new indexes are the same as the old.
level
Number,
Label
Optional
fill_value
List of values
Optional. Specifies the value to use for missing values., Default is NaN
limit
Number
Optional. Limits the number of consecutive NaN values filled using the method.
tolerance
Optional
Example of ReIndex
Now see the below example below to understand it better:
Python
Python
import pandas as pd
# Adding a new city to our data new_data = {'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'], 'Temperature': [75, 82, 68, 90]} new_df = pd.DataFrame(new_data)
# Setting 'City' as the index new_df.set_index('City', inplace=True)
# Reindexing to include a new city new_index = ['Los Angeles', 'Chicago', 'New Orleans'] reindexed_df = new_df.reindex(new_index) print(reindexed_df)
You can also try this code with Online Python Compiler
In the above code, output New Orleans is included in the reindexed DataFrame. As it was not present in the original DataFrame, the temperature value is shown as NaN (missing data) because we knew that reindexing process aligns the data according to the new index values and fills in missing values with NaN by default.
Why Reindex?
Reindexing comes very handy when we have to deal with time series data, where we can have missing dates or irregular intervals. Reindexing helps to ensure that the data remains consistent. Also, the missing values can be easily filled with default values, i.e. NaN or any custom values.
Frequently Asked Questions
What is the use case of the Pandas Index?
Pandas Index provides a unique label for each row or entry, which helps in easy data retrieval and enhances data analysis speed.
How are missing values handled in Reindexing?
Missing values can be filled with default values (e.g., NaN) or custom values during the reindexing. This helps in improving the data accuracy during time of irregular data scenarios.
How to change the default index in a Pandas DataFrame?
In order to change the default index, we can set it using the set_index(). This allows you to customise the index based on the characteristics of data or the needs of analysis.
Conclusion
Congratulations, you did a fantastic job!!. This article has gone through a comprehensive guide to the Pandas Index and Pandas Reindex and is explained with examples. At last, some frequently asked questions have been discussed.