Table of contents
1.
Introduction
2.
Understanding Pandas Pivot Table
2.1.
Syntax
3.
Pandas Pivot Table
3.1.
Return Value
4.
Creating Pandas Pivot Table
4.1.
CODE
4.1.1.
OUTPUT
4.2.
EXPLANATION
5.
Filter Pandas Pivot Table
5.1.
CODE
5.1.1.
OUTPUT
5.2.
EXPLANATION
6.
Sort Pandas Pivot Table
6.1.
CODE
6.1.1.
OUTPUT
6.2.
EXPLANATION
7.
Add New Column
7.1.
CODE
7.1.1.
OUTPUT
7.2.
EXPLANATION
8.
Aggregating Functions
8.1.
CODE
8.1.1.
OUTPUT
9.
Adding Margins
9.1.
CODE
9.1.1.
OUTPUT
9.2.
EXPLANATION
10.
Multiple Indexing
10.1.
CODE
10.1.1.
OUTPUT
10.2.
EXPLANATION
11.
Fill Missing Values
11.1.
CODE
11.1.1.
OUTPUT
11.2.
EXPLANATION
12.
Create a Custom Aggregation Function 
12.1.
CODE
12.1.1.
OUTPUT
12.2.
EXPLANATION
13.
Difference between groupby() V/S pandas pivot_table()
14.
Pros of Pandas Pivot Table
15.
Cons of Pandas Pivot Table
16.
Frequently Asked Questions
16.1.
Are there any limitations of the pandas pivot table?
16.2.
Can you use the pandas pivot table with non-numerical data?
16.3.
What data science tools or libraries work well with pandas pivot table?
16.4.
Can you group data by multiple columns in the pandas pivot table? 
17.
Conclusion
Last Updated: Mar 27, 2024
Medium

Pandas Pivot Table

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Hey, coders. Are you interested in data analysis and want to know more about pandas pivot table? Understanding the pandas pivot table is necessary when working with data in Python. This article will make learning easier about creating the pandas pivot tables, their properties, and how you can use them in different cases.

og image

The pandas pivot table is crucial to data analysis. Using pivot tables will help you to organize your data within a DataFrame. By the end, you'll know the pandas pivot table well. So let's explore this topic more.

Understanding Pandas Pivot Table

how to organise data?

The pandas pivot table is used for rearranging data in Python. It will allow you to group your data by one or more columns. The pandas pivot tables will make it simple for you to identify different patterns. You can customize the rows and columns in a pandas pivot table according to your needs. You can apply aggregation functions such as count, sum, mean etc, to your data. The pandas pivot tables will help you easily handle large datasets with many columns. You can filter and sort your data in a pandas pivot table. 

Syntax

ppt.pivot_table(data, values=None, index=None, columns=None, aggfunc='max', fill_value=None, margins=False, dropna=True)

Pandas Pivot Table

Here is a description of each parameter:

Parameter Description
data The DataFrame you want to use for the pandas pivot table.
values The columns are to be aggregated.
index The columns to be used for the index (rows).
columns The columns to be used for the columns in the pandas pivot table.
fill_value The value to replace missing values with.
margins A boolean indicates whether to include subtotals or grand totals.
dropna A boolean indicates whether to exclude rows or columns that contain missing values.

There are also other optional parameters that you can use to customize your pandas pivot table.

aggfunc The function for aggregation (e.g. sum, count, mean, max etc.).
margins_name A string to use as the label for the column or row having a total.

Return Value

It is used to create a DataFrame that resembles an Excel pivot table, where the pivot table's hierarchical levels are stored as MultiIndex structures in both the DataFrame's index and columns.

Creating Pandas Pivot Table

Here is an example of pivot tables in pandas:

We are creating this table using the pandas library in Python. The table in this example has a DataFrame based on the values  'coder' and 'language' columns. The coders are shown in the rows. The languages in which they code are shown in the columns. The output table shows the number of questions each coder solves in each language. The values in the table are the sums of the 'questions_solved.'

CODE

import numpy as np
import pandas as ppt

# First, create a sample DataFrame
sample = {
    'coder': ['Ninja1', 'Ninja2', 'Ninja3', 'Ninja1', 'Ninja2', 'Ninja3'],
    'language': ['Python', 'Python', 'C++', 'Java', 'Python', 'C++'],
    'questions_solved': [50, 75, 100, 25, 80, 60]
}

df = ppt.DataFrame(sample)

# Now create the pivot table
pivot_table = ppt.pivot_table(df, values='questions_solved', index='coder', columns='language', aggfunc='sum')

# Finally, print the pivot table
print(pivot_table)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output 1

EXPLANATION

This table has three columns: 'coder''language', and 'questions_solved'. The pivot table will use the 'coder' column as the rows. The pivot table will use the 'language' column in the columns. The pivot table will use the 'questions_solved' column as the value. The 'aggfuncparameter specifies the values in the table you add.

You can try many operations to manipulate and filter the pandas pivot table we created in the above example. Here are a few code lines to add:

Filter Pandas Pivot Table

You can filter the pivot table to only show rows where the total number of questions solved is greater than or equal to 100.

CODE

import numpy as np
import pandas as pd

# First, create a sample DataFrame
data = {
    'coder': ['Ninja1', 'Ninja2', 'Ninja3', 'Ninja1', 'Ninja2', 'Ninja3'],
    'language': ['Python', 'Python', 'C++', 'Java', 'Python', 'C++'],
    'questions_solved': [50, 75, 100, 25, 80, 60]
}

df = pd.DataFrame(data)

# Now create a pivot table using the pandas `pivot_table` function
pivot_table = pd.pivot_table(df, values='questions_solved', index='coder', columns='language', aggfunc=np.sum)

#Finally, creating the filtered table
filtered_table = pivot_table[pivot_table.sum(axis=1) >= 100]

print(filtered_table)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output 2

EXPLANATION

The filtering condition is that the sum of values for each row (specified by axis=1) in the pivot_table DataFrame must be greater than or equal to 100. The sum() function calculates the sum of values across each row in the pivot_table.

Sort Pandas Pivot Table

You can sort the pivot table by the total number of questions solved for each coder.

CODE

import numpy as np
import pandas as pd

# First, create a sample DataFrame
data = {
    'coder': ['Ninja1', 'Ninja2', 'Ninja3', 'Ninja1', 'Ninja2', 'Ninja3'],
    'language': ['Python', 'Python', 'C++', 'Java', 'Python', 'C++'],
    'questions_solved': [50, 75, 100, 25, 80, 60]
}

df = pd.DataFrame(data)

# Now create a pivot table using the pandas `pivot_table` function
pivot_table = pd.pivot_table(df, values='questions_solved', index='coder', columns='language', aggfunc=np.sum)

#Finally, creating the sorted table
sortedone_table = pivot_table.sort_values(by=pivot_table.sum(axis=1), ascending=False)

print(sortedone_table)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output 3


EXPLANATION

Here pivot_table.sum(axis=1) calculates the sum of values for each coder. Then sort_values() will sort a DataFrame by a column you will specify. by=pivot_table.sum(axis=1) shows that we are sorting the rows according to the sum of values for each row. ascending=False shows that we are sorting in descending order.

Add New Column

You can also add a new column to the pivot table that shows the percentage of questions solved for each language by each coder.

CODE

import numpy as np
import pandas as pd

# First, create a sample dataframe
data = {
    'coder': ['Ninja1', 'Ninja2', 'Ninja3', 'Ninja1', 'Ninja2', 'Ninja3'],
    'language': ['Python', 'Python', 'C++', 'Java', 'Python', 'C++'],
    'questions_solved': [50, 75, 100, 25, 80, 60]
}

df = pd.DataFrame(data)

# Now creating a pivot table using the pandas `pivot_table` function
pivot_table = pd.pivot_table(df, values='questions_solved', index='coder', columns='language', aggfunc=np.sum)
total_questions_solved = pivot_table.sum().sum()
pivot_table['Python %'] = pivot_table['Python'] / total_questions_solved
pivot_table['Java %'] = pivot_table['Java'] / total_questions_solved
pivot_table['C++ %'] = pivot_table['C++'] / total_questions_solved

print(pivot_table)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output 4

EXPLANATION

To calculate the percentage of questions solved for each language, we divide the number of questions solved by the total number of questions for all languages. We add new columns to the pivot_table DataFrame: Python %, Java %, and C++ %. Then finally, we print the table.

Aggregating Functions

There are many aggregation functions that you can use with the aggfunc parameter.

aggfunc = ‘?’

Function Description
mean() It calculates the mean value of the group.
count() It counts the number of non-null values in the group.
min() It calculates the minimum value in the group.
max() It calculates the maximum value in the group.
median() It calculates the median value in the group.
std() It calculates the standard deviation of the values in the group.
var() It calculates the variance of the values in the group.
prod() It calculates the product of the values in the group.
sum() It calculates the sum of the values in the group.

You can also set multiple aggregation functions for a column in a pandas pivot table by passing a list of functions to the aggfunc parameter.

CODE

import numpy as np
import pandas as ppt

# First, create a sample data frame
data = {
    'coder': ['Ninja1', 'Ninja2', 'Ninja3', 'Ninja1', 'Ninja2', 'Ninja3'],
    'language': ['Python', 'Python', 'C++', 'Java', 'Python', 'C++'],
    'questions_solved': [50, 75, 100, 25, 80, 60]
}

df = ppt.DataFrame(data)

# First of all, define the aggregation functions you want 
aggregation_functions = {'questions_solved': ['sum', 'mean']}

# Now create the pivot table 
pivot_table = ppt.pivot_table(df, values='questions_solved', index='coder', columns='language', aggfunc=aggregation_functions)

# Finally, print your pivot table
print(pivot_table)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output 5

Adding Margins

Margins can give you an easy way to get a data summary. It will be quicker for you to compare the values in different rows and columns. You can use the margins parameter in the pivot_table function to add margins to a pandas pivot table. To add margins, you can set the margins parameter to True in the pivot_table() function. 

CODE

import numpy as np
import pandas as ppt

# First, create a sample data frame
data = {
    'coder': ['Ninja1', 'Ninja2', 'Ninja3', 'Ninja1', 'Ninja2', 'Ninja3'],
    'language': ['Python', 'Python', 'C++', 'Java', 'Python', 'C++'],
    'questions_solved': [50, 75, 100, 25, 80, 60]
}

df = ppt.DataFrame(data)

# Now create the pivot table
pivot_table = ppt.pivot_table(df, values='questions_solved', index='coder', columns='language', aggfunc='sum', margins=True)

# Finally print the pivot table
print(pivot_table)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output 6

EXPLANATION

Setting the margin pattern to True will add two rows and columns to your pivot table: 

  • Showing the totals for each column.
     
  • Another for each row.
     

Multiple Indexing

You can modify the 'data' to add an extra column. Then create a pandas pivot table with multiple index columns. 

CODE

import numpy as np
import pandas as ppt

# First, create a DataFrame with an extra column according to your need
sample = {
    'coder': ['Ninja1', 'Ninja2', 'Ninja3', 'Ninja1', 'Ninja2', 'Ninja3'],
    'language': ['Python', 'Python', 'C++', 'Java', 'Python', 'C++'],
    'level': ['Beginner', 'Advanced', 'Advanced', 'Beginner', 'Intermediate', 'Intermediate'],
    'questions_solved': [50, 75, 100, 25, 80, 60]
}

df = ppt.DataFrame(sample)

# Now create the pivot table with multiple index columns
pivot_table = ppt.pivot_table(df, values='questions_solved', index=['coder', 'level'], columns='language', aggfunc='sum')

# Finally, print your pivot table
print(pivot_table)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output 7

EXPLANATION

The output table will give you a more detailed view. You can quickly analyse and compare each 'coder' performance at different levels. The index level 'coder' groups the data by the unique values in the 'coder' column. While the index level 'level' further divides the data within each 'coder' group by the unique values in the 'level' column.

Fill Missing Values

You can fill the missing values in a pivot table in python using the fillna() method.

CODE

import pandas as pd
import numpy as np

# Sample data
data = {
   'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
   'Value': [10, 20, 15, np.nan, 30, np.nan]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Create a pivot table
pivot_table = df.pivot_table(values='Value', index='Category', aggfunc=np.sum)

# Fill missing values with a specific value (e.g., 0)
pivot_table_filled = pivot_table.fillna(0)

print(pivot_table_filled)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output

EXPLANATION

In this example, we first created a pivot table using pivot_table(), and then we used fillna(0) to fill in missing values with zeros. You can replace 0 with any other value you prefer to fill in the missing data.

Create a Custom Aggregation Function 

You can define your custom aggregation function. When creating a pandas pivot table, you can pass your custom function as the aggfunc parameter.

CODE

import numpy as np
import pandas as ppt

# First, create a DataFrame
sample = {
    'coder': ['Ninja1', 'Ninja2', 'Ninja3', 'Ninja1', 'Ninja2', 'Ninja3'],
    'language': ['Python', 'Python', 'C++', 'Java', 'Python', 'C++'],
    'questions_solved': [50, 75, 100, 25, 80, 60]
}

df = ppt.DataFrame(sample)

# Define a custom aggregation function whatever your want
def sum_of_squares(x):
    return np.sum(x**2)
    
# Now create the pivot table 
pivot_table = ppt.pivot_table(df, values='questions_solved', index='coder', columns='language', aggfunc=sum_of_squares)

# Finally, print the pivot table
print(pivot_table)
You can also try this code with Online Python Compiler
Run Code

OUTPUT

output 8

EXPLANATION

We defined a custom aggregation function sum_of_squares. This function takes an input array x and will return the sum of squares. The output pivot table will have the sum of squares of the questions_solved values for each coder and language combo.

Difference between groupby() V/S pandas pivot_table()

Here are some differences between groupby() and pandas pivot_table():

groupby() pandas pivot_table()
It groups data by one or more columns and applies a function to each group. It aggregates data in a pivot table way with indexes and columns.
It gives a DataFrame or Series output. It gives a DataFrame output.
It only works with a single index. It can have multiple levels of indexes.
It can only apply one function to each group. It can apply multiple aggregation functions.
It is faster for simple grouping tasks. It is better for more complex tasks.

Pandas pivot tables have various real-life applications. It includes sales data analysis, HR data analysis, marketing campaign analysis, financial reporting, and supply chain analysis

Pros of Pandas Pivot Table

Here are some pros of using pandas pivot table:

  • It gives you a user-friendly way to summarize and analyze large datasets.
     
  • It is very customizable. It will allow you to control aggregation functions, columns, and indexes.
     
  • You can easily group data with multiple columns or levels.
     
  • It has valuable features such as adding subtotal rows/columns and sorting functions.
     
  • You can also manage missing or incomplete data.
     
  • It will provide you with an excellent visual presentation of data. It will be quicker to interpret than raw data.

Cons of Pandas Pivot Table

Here are some cons of using pandas pivot table:

  • Sometimes it can be slower with massive datasets and complex aggregations.
     
  • It requires sound technical knowledge of Pandas syntax and data manipulation.
     
  • Sometimes it can result in a loss of precision for tiny datasets.
     
  • It is not so helpful with non-numeric data.
     
  • It may be less efficient with real-time analysis and continuous data processing.

Frequently Asked Questions

Are there any limitations of the pandas pivot table?

It is challenging to handle vast datasets with millions of rows. It has limitations in handling complex multi-level indices. There are some restrictions on performing custom calculations on subsets of the data. 

Can you use the pandas pivot table with non-numerical data?

Nonetheless, should you employ a pivot_table with non-numeric values, there is a possibility of encountering a DataError: No numeric types available for aggregation.

What data science tools or libraries work well with pandas pivot table?

NumPy, Matplotlib, Seaborn, and Jupyter Notebook are some data science tools that work well. You can use these tools to manipulate, plot, and visualise data.

Can you group data by multiple columns in the pandas pivot table? 

You can make a Pandas pivot table using several columns, and it gives you a new table. You can change how it looks by adjusting the columns.

Conclusion

This article discusses the pandas pivot table's usage, definition, pros, and cons. It explains how to add margins, create custom functions, and more. The article also includes examples to help understand how to use the pandas pivot table in different cases. Refer to other pandas and Python articles to improve your understanding of the pandas pivot table.

Refer to our guided paths on Coding Ninjas Studio to learn more about DSA, Competitive Programming, JavaScript, System Design, etc. Enrol in our courses, refer to the mock test and problems look at the interview experiences and interview bundle for placement preparations.

Happy Coding!

 

Live masterclass