Table of contents
1.
Introduction
2.
A brief about Pandas 
3.
What is groupby() in Pandas?
3.1.
Python
4.
What is count() in Pandas?
4.1.
Python
5.
Chaining groupby() and count() in Pandas
5.1.
Categorical Data Analysis
5.2.
Python
5.3.
Product Inventory Analysis
5.4.
Python
5.5.
Event Frequency Analysis
5.6.
Python
6.
Frequently Asked Questions
6.1.
What is NaN in Python?
6.2.
What is an index in a Pandas series?
6.3.
What is a multi-level column index in Pandas?
7.
Conclusion
Last Updated: Mar 27, 2024

Groupby.count in Pandas

Author Abhinav Anand
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Pandas is a fast, flexible, powerful, and easy-to-use open-source library that provides data structures, such as Dataframe and Series, for storing structured data and methods for their analysis and manipulation.

groupby.count in pandas

In this article, you will learn about the groupby() and count() functions in Pandas with the help of examples. Both of these functions can be used together, i.e., you can chain them for counting valid data points in each group of the grouped dataframe. It is useful for performing counting based on a grouping criterion such as category labels or timestamps.

Before getting started, let’s look at a brief introduction to Pandas.

A brief about Pandas 

Pandas is an open-source Python library that provides data manipulation and analysis tools for working with structured data. It is built on top of the NumPy library and offers data structures for efficiently working with tabular data.

To follow along with the examples, you should install the latest version of Pandas.

Now that you know about Pandas, we will move on to the groupby() function.

What is groupby() in Pandas?

The groupby() function is used for grouping a dataframe or series based on the values of one or more columns.

Based on the data structure you used, this function can return an object of the following data types:-

  • pandas.core.groupby.DataFrameGroupBy (for dataframes)
     
  • pandas.core.groupby.SeriesGroupBy (for series)

 

Using the groupby() function involves three main steps:-

  1. Splitting: Data is split into groups based on the values in the specified columns, and each unique value in the grouping column becomes a group

     
  2. Applying: Operations such as aggregation, transformation, or filtering are applied to each group

     
  3. Combining: The results of the operations for each group are combined into a new dataframe or series

 

Let’s take a look at an example.

  • Python

Python

import pandas as pd


data = {

 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],

 'Department': ['HR', 'IT', 'HR', 'Finance', 'IT'],

 'Salary': [60000, 75000, 55000, 80000, 70000]

}


df = pd.DataFrame(data)


print(df)


grouped = df.groupby('Department')


print("\n")

print(grouped.groups)



average_salary_by_department = grouped['Salary'].mean()


print("\n")

print(average_salary_by_department)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here, we created a dataframe with 3 columns - Name, Department, and Salary, used the groupby() function, and then printed the groups using the .groups attribute. Then we applied the mean operation on the grouped data based on the salary values and stored the result in a new dataframe.

Let’s look at the different parameters the groupby() function can accept:-

  • by: This parameter specifies the columns or keys by which the dataframe should be grouped. It accepts a single column name or a list of column names.
     
  • level: If a dataframe has a multi-level columns index, this parameter specifies the levels on which the grouping has to be performed.
     
  • axis: This parameter specifies the axis along which the group should be performed. It is 0 by default, meaning rows are grouped, but you can also use 1 to group columns.
     
  • as_index: This parameter controls whether the grouping columns should become the index of the resulting dataframe. It accepts boolean values, and the default is true.
     
  • sort: Specifies whether to sort the resulting groups by the group keys. It accepts boolean values, and the default is true.
     
  • squeeze: If the resulting data only has a single group, this parameter specifies if the function should return a series instead of a dataframe.
     

In the following section, you will learn about the count() function in Pandas.

What is count() in Pandas?

The count() function is used for counting the number of non-null values in each column of a dataframe or series. It is a quick way to find how many valid entries exist in each column. This function returns a series, where each element is the count of valid values under each column of the input dataframe.

Let’s take a look at an example.
 

  • Python

Python

import pandas as pd


data = {

   'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],

   'Age': [25, 32, None, 28, 22],

   'Salary': [60000, 75000, 55000, None, 70000]

}


df = pd.DataFrame(data)


print(df)



column_counts = df.count()



print("\n")

print(column_counts)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here, we created a dataframe with 3 columns - Name, Age, and Salary. Age and Salary columns have null values, which are represented with NaN in Pandas. The count() function returns a series containing the count of valid entries in each column.

The count() function only accepts the following 2 parameters:-

  • axis: This parameter specifies the axis along which the count should be calculated. It accepts 0 (default) for counting along columns and 1 for counting along rows.
     
  • numeric_only: If this parameter is set to true, this function excludes non-numeric columns while calculating the count.
     

In the next section, we will look at the examples illustrating use cases of chaining groupby() and count() functions.

Chaining groupby() and count() in Pandas

The following are some examples where we have used the groupby() and count() functions together:-

Categorical Data Analysis

You can count the number of occurrences of each category in a column.

  • Python

Python

import pandas as pd


data = {

   'Category': ['A', 'B', 'A', 'B', 'A', 'A'],

   'Value': [10, 20, 15, 25, 30, 35]

}


df = pd.DataFrame(data)


print(df)

print("\n")


category_counts = df.groupby('Category').count()

print(category_counts)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here we grouped the data based on the values of the Category column and then used the count() function on the grouped data to count the number of occurrences of each unique category.

Product Inventory Analysis

If you have data about product inventory levels over time, you can use groupby() to group the data by ProductID and then use count() to calculate the number of days each product was in stock.

  • Python

Python

import pandas as pd


data = {

   'Date': ['2023-08-01', '2023-08-01', '2023-08-02', '2023-08-02'],

   'ProductID': ['A', 'B', 'A', 'C']

}


df = pd.DataFrame(data)


print(df)

print("\n")


product_days_in_stock = df.groupby('ProductID')['Date'].count()

print(product_days_in_stock)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Event Frequency Analysis

If you have a log of events, you can use the groupby() function to group the data by Category and then use count() to calculate the number of times that event occurred.

  • Python

Python

import pandas as pd


data = {

   'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'C'],

   'Timestamp': ['2023-08-01', '2023-08-01', '2023-08-02', '2023-08-02', '2023-08-03', '2023-08-03', '2023-08-03']

}


df = pd.DataFrame(data)


print(df)

print("\n")


event_counts = df.groupby('Category').count()

print(event_counts)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Frequently Asked Questions

What is NaN in Python?

In Python, NaN stands for “Not a Number, " a special floating point value used to represent undefined or unrepresentable numerical values. In Pandas, NaN represents missing values that arise from concatenation operations performed on dataframes.

What is an index in a Pandas series?

An index is used for uniquely identifying each element in a series. It can be a simple integer, or you can specify informative labels for each element. The labels are immutable, which means they cannot be changed after being assigned to an element.

What is a multi-level column index in Pandas?

It is a way to represent data in a dataframe using multiple levels of column labels, meaning each column has sub-columns. Using Pandas, you can perform various operations on dataframes containing multi-level column indices, such as slicingstackingaggregation, etc.

Conclusion

In this article, you learned about the open-source Python library - Pandas. We discussed its features and two important functions it provides - groupby() and count().

Go through the following articles to learn more about Pandas:-

Happy Learning!

Live masterclass