Table of contents
1.
Introduction
2.
Understanding Indexes in Pandas 
3.
What is a MultiIndex? 
4.
MultiIndex for Rows
4.1.
Python
4.1.1.
Syntax 
4.1.2.
Parameters 
4.2.
Python
4.3.
Python
4.4.
Python
5.
MultiIndex for Columns
5.1.
Python
5.2.
Python
6.
Ways to Create the MultiIndex
6.1.
Using the MultiIndex.from_tuples() method
6.1.1.
Syntax
6.2.
Python
6.3.
Python
6.4.
Using the MultiIndex.from_arrays() method
6.5.
Python
6.6.
Python
6.7.
Using the MultiIndex.from_product() method
6.8.
Python
6.9.
Python
7.
Sorting Data in MultiIndex Dataframe 
7.1.
Sorting by single level
7.2.
Python
7.3.
Sorting by Multiple levels
8.
Advantages of Pandas MultiIndex
8.1.
Indexing and Slicing with MultiIndex
8.2.
Aggregation and Grouping
8.3.
Reshaping and Stacking
9.
Disadvantages of Pandas MultiIndex
9.1.
Increased memory usage
9.2.
Increased complexity
9.3.
Decreased performance
9.4.
Error prone and less readability
10.
Frequently Asked Questions
10.1.
How do I convert MultiIndex columns to Pandas single index columns?
10.2.
How do you set an index on multiple columns in pandas?
10.3.
How do you sort MultiIndex columns by level in Pandas?
11.
Conclusion
Last Updated: Mar 27, 2024
Medium

Pandas MultiIndex

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Pandas, the popular data manipulation library in Python, offers a powerful feature known as MultiIndex. This feature allows you to work with complex hierarchical and multi-dimensional data more efficiently and intuitively. 

Pandas MultiIndex

In this article, we'll discuss Pandas MultiIndex and explore how it can enhance your data analysis capabilities.

Before we start, let us first pay our attention to the Indexes in Pandas. 

Understanding Indexes in Pandas 

In simple words, an index in Pandas is a labeled array that allows you to uniquely identify and access rows or elements within a DataFrame or Series. You can compare Indexes with the primary key in a table. 

Having said that, it serves as a reference, similar to the row numbers in a spreadsheet, but with more flexibility and functionality. An index can be thought of as a guide that helps you navigate and retrieve data efficiently.

Or, you can also take the reference from the below image to understand the Indexing: 

Understanding Indexes in Pandas

Here, the indexes represent the addresses of the houses. So, by knowing the address of any house you can easily go there. Similarly, to access data you need an index. 

What is a MultiIndex? 

A MultiIndex is an advanced indexing method in Pandas that enables you to assign multiple index levels to a DataFrame or Series. This is particularly useful when you're dealing with data that possesses more than one dimension or categorical hierarchy. 

Think of it as organizing your data into layers, similar to a spreadsheet with rows and columns.

Let's start by explaining MultiIndex using examples for both row and column indices.

MultiIndex for Rows

Suppose you have a dataset containing information about sales transactions for different products in different regions, with the following columns: 

'Product', 

'Region', 

'Date', and 

'Sales'.

  • Python

Python

import pandas as pd

# Sample sales data

data = {

'Product': ['A', 'B', 'A', 'B', 'A'],

'Region': ['North', 'North', 'South', 'South', 'East'],

'Date': ['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-02', '2023-01-01'],

'Sales': [100, 150, 75, 120, 90]

}

df = pd.DataFrame(data)

print(df)
You can also try this code with Online Python Compiler
Run Code


Output

Output of Default Indexing

For creating the multiIndex for rows, we use set_index() method. The set_index() method in pandas is used to set one or more columns as the row index of a DataFrame. 

Syntax
 

The syntax is as follows:

df.set_index(keys, drop=False, append=False, inplace=False, verify_integrity=False)

Parameters 

Let us break down the syntax and understand each parameter. 

  • keys: This is a list of column names or an array of values that will be used as the new index
     
  • drop: This is a boolean value that determines whether the columns used as the new index will be dropped from the DataFrame. By default, this is set to True
     
  • append: This is a boolean value that determines whether the columns used as the new index will be appended to the existing index. By default, this is set to False
     
  • inplace: This is a boolean value that determines whether the operation will be performed in place. By default, this is set to False, which means that a new DataFrame will be created
     
  • verify_integrity: This is a boolean value that determines whether the new index will be checked for duplicates. By default, this is set to False
     

Let us now utilize the above method to create the mutliIndex for rows using ‘Region’ and ‘Date’ Columns. 

  • Python

Python

# Create a MultiIndex using 'Region' and 'Date' columns

df.set_index(['Region', 'Date'], inplace=True)
You can also try this code with Online Python Compiler
Run Code

 

Now, the DataFrame df has a MultiIndex with two levels: 'Region' and 'Date'. 

  • Python

Python

import pandas as pd

# Sample sales data

data = {

'Product': ['A', 'B', 'A', 'B', 'A'],

'Region': ['North', 'North', 'South', 'South', 'East'],

'Date': ['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-02', '2023-01-01'],

'Sales': [100, 150, 75, 120, 90]

}

df = pd.DataFrame(data)



#print the dataframe for the default indexing

print(df)



# Create a MultiIndex using 'Region' and 'Date' columns

df.set_index(['Region', 'Date'], inplace=True)



#print the dataframe for MultiIndex for Rows

print(df)
You can also try this code with Online Python Compiler
Run Code


Output 

Output of MultiIndex for Rows

You can access data using these indices. To access the data, we can use loc[] accessor. The loc[] accessor in pandas is used to select rows and columns by label(s). 

Here, suppose we want to access the sales for the Region North on '2023-01-01' date. 

This is how we are going to access the Sales: 

  • Python

Python

import pandas as pd

# Sample sales data

data = {

'Product': ['A', 'B', 'A', 'B', 'A'],

'Region': ['North', 'North', 'South', 'South', 'East'],

'Date': ['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-02', '2023-01-01'],

'Sales': [100, 150, 75, 120, 90]

}

# Create a MultiIndex using 'Region' and 'Date' columns

df.set_index(['Region', 'Date'], inplace=True)



# Access sales data for the 'North' region on '2023-01-01'

north_sales = df.loc[('North', '2023-01-01'), 'Sales']

print("The sales data for the 'North' region on '2023-01-01' is", north_sales)
You can also try this code with Online Python Compiler
Run Code

 

Output 

Output of MultiIndex for Rows

MultiIndex for Columns

You can also create a MultiIndex for columns. 

Consider a dataset with information about student scores for different subjects in different terms:

  • Python

Python

import pandas as pd

# Sample Student's marks dataset

data = {

('Term 1', 'Math'): [90, 85, 78, 92],

('Term 1', 'Science'): [88, 89, 76, 90],

('Term 2', 'Math'): [91, 88, 80, 94],

('Term 2', 'Science'): [89, 90, 78, 92]

}

# Create a DataFrame with MultiIndex columns

df = pd.DataFrame(data, index=['Student A', 'Student B', 'Student C', 'Student D'])

print(df)
You can also try this code with Online Python Compiler
Run Code


Output

Output of MultiIndex for Cols

In this example, we have a MultiIndex for columns with two levels: 'Term' and 'Subject'. 

You can access data using these indices:
 

  • Python

Python

# Access scores for 'Math' in 'Term 1' for 'Student A'

math_term1_studentA = df.loc['Student A', ('Term 1', 'Math')]

print("The scores for 'Math' in 'Term 1' for 'Student A' is ", math_term1_studentA)
You can also try this code with Online Python Compiler
Run Code


Output 

Output of MultiIndex for Cols

MultiIndexing can be applied to both rows and columns simultaneously, providing a flexible way to represent and analyze multi-dimensional data in pandas. It allows you to perform operations like grouping, pivoting, and slicing data efficiently across multiple levels of the index. 

Ways to Create the MultiIndex

There are many ways to create a MultiIndex in pandas. One we have seen with set_index() method and index parameter in dataframe. But, we have some other powerful ways to create the mulltiIndex: 

Here are some of the most common ways:

Using the MultiIndex.from_tuples() method

The MultiIndex.from_tuples() method is a way to create a MultiIndex from a list of tuples. 

Syntax

The syntax is as follows:

MultiIndex.from_tuples(tuples, names=None, levels=None, verify_integrity=False)


Let us break down the syntax: 

  • tuples: This is a list of tuples, where each tuple represents a single row or column in the MultiIndex
     
  • names: This is a list of names for the levels of the MultiIndex. If not specified, the levels will be named "level_0", "level_1", and so on
     
  • levels: This is a list of lists, where each list represents the values for a single level of the MultiIndex. If not specified, the levels will be inferred from the tuples argument
     
  • verify_integrity: This is a boolean value that determines whether the MultiIndex will be checked for duplicates. By default, this is set to False
     

Example : Let us create a tuple having some data regarding the Country and its respective city. 

  • Python

Python

import pandas as pd

tuples = [('India', 'New Delhi'), ('Pakistan', 'Lahore'), ('UK', 'London')]

multi_index = pd.MultiIndex.from_tuples(tuples, names=['country', 'city'])



# Create a sample DataFrame using the MultiIndex

data = {'Population (millions)': [1380, 13.2, 8.9]}

df = pd.DataFrame(data, index=multi_index)

print(df)
You can also try this code with Online Python Compiler
Run Code

 

Output 

MultiIndex.from_tuples() method

Now, let us say you want to access the Population of London. Here’s how you achieve this: 

  • Python

Python

# Get the Population of London

pop_UK = df.loc[('UK', 'London'), 'Population (millions)']



# Print the result

print("Population of London", pop_UK, "millions")
You can also try this code with Online Python Compiler
Run Code

 

Output 

 

MultiIndex.from_tuples() method

Using the MultiIndex.from_arrays() method

The MultiIndex.from_arrays() method is a way to create a MultiIndex from an array. 

The syntax remains the same as for what we have discussed for tuples.
 

Example 

  • Python

Python

import pandas as pd

# Create arrays for the MultiIndex levels

level1 = ['A', 'A', 'B', 'B']

level2 = ['Python', 'C++', 'Python', 'C++']

# Create a MultiIndex from the arrays

multi_index = pd.MultiIndex.from_arrays([level1, level2], names=['Ninja', 'Subject'])



# Create a sample DataFrame with the MultiIndex

data = {'Marks Out of 50': [40, 34, 30, 40]}

df = pd.DataFrame(data, index=multi_index)



# Display the DataFrame

print(df)
You can also try this code with Online Python Compiler
Run Code


Output 

MultiIndex.from_array() method

Now, if you want to access the marks, you can use loc[] accessor. 

Say, you want to get the marks for Ninja A in C++:- 

Here, you will achieve this: 

  • Python

Python

# Get the marks for Ninja A in C++

marks_ninja_a_cplus = df.loc[('A', 'C++'), 'Marks Out of 50']



# Print the result

print("Marks for Ninja A in C++:", marks_ninja_a_cplus)
You can also try this code with Online Python Compiler
Run Code


Output

Output for MultiIndex.from_tuples() method

Using the MultiIndex.from_product() method

The MultiIndex.from_product() method is used to create a MultiIndex by taking the Cartesian product of multiple iterables (e.g., lists or arrays) to form index levels. 

Each unique combination of values from these iterables forms a unique index label. 

Confused? Don’t worry, we will do this together. 

Let us one example to understand its working: 

Example 

Suppose we have a list of colors and the sizes available for those colors. Now, we need to sell the colors as per the sizes. So, we will create one dataframe that will have the prices of them and will use the MultiIndex.from_product() method for Cartesian products. 

  • Python

Python

import pandas as pd

# Create lists representing levels of the MultiIndex

colors = ['Red', 'Green', 'Blue']

sizes = ['Small', 'Medium', 'Large']



# Create a MultiIndex from the Cartesian product of these lists

multi_index = pd.MultiIndex.from_product([colors, sizes], names=['Color', 'Size'])



# Create a DataFrame with the MultiIndex

data = {'Price': [5, 10, 15, 8, 12, 18, 7, 14, 21]}

df = pd.DataFrame(data, index=multi_index)



# Print the DataFrame

print(df)
You can also try this code with Online Python Compiler
Run Code


Output 

Output for MultiIndex.from_product() method

Now, let us say you want to get the price for Medium Blue color. Here’s how you achieve this:

  • Python

Python

# Get the price of Blue color of Medium Size

price_of_medium_blue = df.loc[('Blue', 'Medium'), 'Price']



# Print the result

print("Price of Blue color of Medium Size :", price_of_medium_blue)
You can also try this code with Online Python Compiler
Run Code

 

Output 

Output for MultiIndex.from_product() method

Sorting Data in MultiIndex Dataframe 

We can also sort the data in a multiindex dataframe which helps in making the data more organized. Sorting data can help us to arrange the data in a multiindex dataframe at every level which eventually makes it easier to access, analyze and visualize the data. We can choose to sort the data by single or multiple levels.

sorting example

To sort a multindex dataframe, we can use the method .sort_index().

Let us have a look examples which demonstrate the same.

Consider a dataset containing information about sales transactions for different products in different regions.

Sorting by single level

  • Python

Python

import pandas as pd

# Sample sales data

data = {

'Product': ['A', 'B', 'A', 'B', 'A'],

'Region': ['North', 'North', 'South', 'South', 'East'],

'Date': ['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-02', '2023-01-01'],

'Sales': [100, 50, 75, 100, 90]

}



df = pd.DataFrame(data)



# Create a MultiIndex using 'Region' and 'Sales' columns

df.set_index(['Region', 'Sales'], inplace=True)



# Sort the data frame based on 'Sales' in ascending order

sorted_df_by_sales = df.sort_index(level='Sales', ascending=True)

print(sorted_df_by_sales)
You can also try this code with Online Python Compiler
Run Code

 

Output

sorting

In this example, we defined a multiindex on levels ‘Region’ and ‘Sales’, and then we used the method .sort_index on the data frame to sort the data by ‘Sales’ in ascending order.

Sorting by Multiple levels

import pandas as pd

# Sample sales data
data = {
'Product': ['A', 'B', 'A', 'B', 'A'],
'Region': ['North', 'North', 'South', 'South', 'East'],
'Date': ['2023-01-01', '2023-01-02', '2023-01-01', '2023-01-02', '2023-01-01'],
'Sales': [100, 50, 75, 100, 90]
}

df = pd.DataFrame(data)

# Create a MultiIndex using 'Region' and 'Sales' columns
df.set_index(['Region', 'Sales'], inplace=True)

# Sort the data frame based on 'Sales' in ascending order and then based on 'Region' in descending order
sorted_df_by_sales_and_region = df.sort_index(level=['Sales', 'Region'], ascending=[True, False])
print(sorted_df_by_sales_and_region)
You can also try this code with Online Python Compiler
Run Code

Output

sorting

In this example, we used the method .sort_index on the data frame to sort the data by multiple levels i.e. sort the data by ‘Sales’ in ascending order, and then by ‘Region’ in descending order.

Advantages of Pandas MultiIndex

There are several advantages of Pandas MultiIndex, some of which are discussed below: 

Advantages of Pandas MultiIndex

Indexing and Slicing with MultiIndex

Once you have a MultiIndex DataFrame, indexing and slicing become more versatile. You can now access data at various levels using .loc[] and .iloc[].

Aggregation and Grouping

MultiIndex can significantly simplify aggregation and grouping operations. You can group data by one or more levels and compute summary statistics or perform custom aggregation functions.

Reshaping and Stacking

MultiIndex makes reshaping and stacking operations more intuitive. You can pivot your data between the index levels and columns using methods like .unstack() and .stack().

Disadvantages of Pandas MultiIndex

Multiindexing in Pandas offers a great set of advantages, but on the other hand there are some cons and challenges which needs to be weighed against the advantages it offers to decide if it perfectly fits for your use cases.

Disadvantages of Pandas MultiIndex

Some of the cons and challenges of Pandas Multiindex are discussed below:

Increased memory usage

Multiindexing can lead to an increase in memory usage, as using multiple levels of indexing requires more memory to store the corresponding index data, and thus it might not scale well on systems with limited amount of memory where you need to deal with large datasets.

Increased complexity

Multiindexing in some cases might overcomplicate your data structure as it adds complexity to your code when dealing with multiple index levels. It becomes crucial to assess if the added complexity is justified with the advantages of multiindexing for your use cases.

Decreased performance

Multiindexing supports aggregation and grouping operations, however with the added complexity due to multiple index levels, they also lead to additional overhead for Pandas to perform operations like grouping, reshaping etc, as it needs to analyse multiple index levels.

Error prone and less readability

With the introduction of increased levels of complexity in multiindex dataframes, the chances of errors in the analysis increase substantially. The complexity of index structure may lead to less readability, as the code base evolves which eventually can lead to errors in selecting, indexing or manipulating the data.

Frequently Asked Questions

How do I convert MultiIndex columns to Pandas single index columns?

To revert the index of the dataframe from multi-index to a single index, you can use the Pandas In-built function, i.e, reset_index().

How do you set an index on multiple columns in pandas?

In pandas, you can set an index on multiple columns by passing a list of column names to the set_index() method of a DataFrame. This will create a MultiIndex, also known as a hierarchical index, with the specified columns as its levels.

How do you sort MultiIndex columns by level in Pandas?

For sorting the MultiIndex columns by level in Pandas, we can use the Pandas In-built method, i.e., sortlevel() method in Pandas. 

Conclusion

Pandas MultiIndex is a powerful tool that unlocks the potential of working with multi-dimensional and hierarchical data in Python. By leveraging MultiIndex, you can create more structured, organized, and meaningful DataFrames, simplifying tasks such as indexing, slicing, aggregation, and reshaping.

Here are some more related articles:

Check out The Interview Guide for Product Based Companies and some famous Interview Problems from Top Companies, like AmazonAdobeGoogle, etc., on Coding Ninjas Studio.
 

Also, check out some of the Guided Paths on topics such as Data Structure and AlgorithmsCompetitive ProgrammingOperating SystemsComputer Networks, DBMSSystem Design, etc., as well as some Contests, Test SeriesInterview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.

We hope you liked this article.

"Have fun coding!”
 

Live masterclass