Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
What is a Data Frame?
2.1.
Installing Pandas
2.2.
Bash
2.3.
Python
3.
How to Concatenate DataFrames in Pandas?
3.1.
Vertical Concatenation
3.2.
Python
3.3.
Horizontal Concatenation
3.4.
Python
4.
How to Reshape DataFrames in Pandas?
4.1.
Pivoting
4.2.
Python
4.3.
Melting
4.4.
Python
4.5.
Stacking
4.6.
Python
5.
Frequently Asked Questions
5.1.
What is Series in Pandas?
5.2.
What is NaN in Python?
5.3.
What is a multi-level column index in Pandas?
6.
Conclusion
Last Updated: Mar 27, 2024
Easy

Concatenate and Reshape Data frames in Pandas

Author Abhinav Anand
0 upvote

Introduction

Pandas is an open-source Python library that provides powerful data manipulation and analysis tools commonly used in machine learning. It provides various data structures, such as Series and Dataframes, and methods for manipulating them.

concatenate and reshape dataframes in pandas

In this article, you will learn how to concatenate and reshape data frames in Pandas.

Let’s get started.

What is a Data Frame?

A Dataframe is one of the core data structures provided by Pandas. It is a two-dimensional tabular data structure where data is organized into rows and columns. Each column can contain different data types, such as integers, floats, strings, etc.

Before moving forward, let's quickly see how you can install Pandas.

Installing Pandas

Run the following command in your terminal.

  • Bash

Bash

pip install pandas

 

Output

output

Now, let’s see an example of creating a dataframe in Pandas.

  • Python

Python

import pandas as pd


data = {

   'Name': ['Alice', 'Bob', 'Charlie'],

   'Age': [25, 30, 28],

   'City': ['New York', 'Los Angeles', 'Chicago']

}


df = pd.DataFrame(data)

print(df)
You can also try this code with Online Python Compiler
Run Code

 

Output

output

Here we used the DataFrame() method to create a dataframe using a dictionary and printed it using the print() function.

Now, you will learn about the concatenate function in Pandas.

How to Concatenate DataFrames in Pandas?

Pandas provides the concat() function that allows you to combine multiple dataframes either vertically (along rows) or horizontally (along columns).

Let’s look at both ways of combining dataframes using the concat() function.

Vertical Concatenation

The concat() function performs vertical concatenation by default. You just have to pass an array containing the dataframes, and the function will return the resulting dataframe.

  • Python

Python

import pandas as pd



data1 = {'A': [1, 2, 3], 'B': [4, 5, 6]}

data2 = {'C': [7, 8, 9], 'D': [10, 11, 12]}



df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)



# Concatenating vertically (along rows)

result_vertical = pd.concat([df1, df2], ignore_index=1)



print(df2)

print(df1)

print(result_vertical)
You can also try this code with Online Python Compiler
Run Code

 

Output

output

The ignore_index parameter was set to 1 to create a new index for the concatenated dataframe. Also, all the missing values are replaced with NaN. 

Horizontal Concatenation

To concatenate horizontally, you need to specify the axis parameter as 1.

  • Python

Python

import pandas as pd


data1 = {'A': [1, 2, 3], 'B': [4, 5, 6]}

data2 = {'C': [7, 8, 9], 'D': [10, 11, 12]}


df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)


# Concatenating horizontally (along columns)

result_horizontal = pd.concat([df1, df2], axis=1)


print(df2)

print(df1)

print(result_horizontal)
You can also try this code with Online Python Compiler
Run Code

 

Output

output

No missing values exist, as the dataframes have the same number of rows.

In the next section, you will learn about the various functions used for reshaping dataframes in Pandas.

How to Reshape DataFrames in Pandas?

Reshaping dataframes in Pandas involves transforming the layout of your data to make it compatible with your analysis needs.

Let’s look at the various ways of reshaping dataframes in Pandas.

Pivoting

Pivoting is the process of transforming data from a long format to a wide format. The unique values in one column become the new columns, and the values under a different column populate the cells under these new columns.

pivoting image

Credit: pandas.pydata.org

Pandas provides the pivot() function for performing pivoting. It expects the following parameters:-

  1. index: This parameter specifies the column whose unique values are used as the pivoted dataframe's row index.
     
  2. columns: This parameter specifies the columns whose unique values are used for creating new columns in the pivoted dataframe.
     
  3. values: This parameter specifies the columns whose values are used for filling the cells of the pivoted dataframe.

 

Let’s see an example.

  • Python

Python

import pandas as pd



data = {

   'Date': ['2023-08-01', '2023-08-01', '2023-08-02', '2023-08-02'],

   'Category': ['A', 'B', 'A', 'B'],

   'Value': [10, 20, 15, 25]

}


df = pd.DataFrame(data)


print("Original DataFrame:")

print(df)


pivot_df = df.pivot(index='Date', columns='Category', values='Value')


print("\nPivoted DataFrame:")

print(pivot_df)
You can also try this code with Online Python Compiler
Run Code

 

Output

output

Here we used the Category for creating new columns and the Date columns for creating row indices.

Melting

Melting is the process of transforming data from a wide format to a long format. It is also known as unpivoting. In this process, multiple columns are converted into a single column with the help of an additional column that captures the original column labels.

melting image

Credit: pandas.pydata.org

Pandas provides the melt() function for performing melting. It expects the following parameters:-

  1. id_vars: This parameter specifies the columns that won't be melted. It can be a single column or multiple columns in an array.
     
  2. var_name: This parameter specifies the name of the new column that will capture the original column labels. It defaults to “variable”.
     
  3. value_name: This parameter specifies the name of the new column containing the values from the melted columns. It defaults to “value”.

 

Let’s see an example.

  • Python

Python

import pandas as pd


data = {

   'Name': ['Alice', 'Bob', 'Charlie'],

   'Math': [90, 85, 70],

   'Science': [88, 76, 92],

   'History': [78, 82, 65]

}


df = pd.DataFrame(data)


print("Original DataFrame:")

print(df)


melted_df = df.melt(id_vars=['Name'])


print("\nMelted DataFrame:")

print(melted_df)
You can also try this code with Online Python Compiler
Run Code

 

Output

output

Here all columns except Name are melted. The variable column captures the name of the original column, and the value column stores the values from these melted columns.

Stacking

Stacking is the process of pivoting a level of the column labels. It is commonly used in dataframes containing a multi-level column index.

 

stacking image

Credit: pandas.pydata.org

Pandas provides the stack() function for performing stacking. It expects a single parameter that denotes the index of the nested column that you want to pivot. By default, this function pivots the innermost level.

Let’s see an example.

  • Python

Python

import pandas as pd



data = {

   ('Electronics', 'Q1'): [1000, 1500],

   ('Electronics', 'Q2'): [1200, 1800],

   ('Clothing', 'Q1'): [500, 800],

   ('Clothing', 'Q2'): [600, 950]

}


df = pd.DataFrame(data)



df.index = ['Region 1', 'Region 2']

df.columns.names = ['Category', 'Quarter']



print("Original DataFrame:")

print(df)


stacked_df = df.stack(0)


print("\nStacked DataFrame:")

print(stacked_df)
You can also try this code with Online Python Compiler
Run Code

 

Output

output

There are two levels of column labels in the input dataframe. The topmost level has an index of 0, and we specified 0 while calling the stack() function, so it pivoted the topmost level of column labels.

Frequently Asked Questions

What is Series in Pandas?

A series is a one-dimensional labeled data structure that can hold data types such as numeric, string, boolean, etc. Each element in a series has a corresponding label called an index which may or may not be a numeric value.

What is NaN in Python?

In Python, NaN stands for “Not a Number, " a special floating point value used to represent undefined or unrepresentable numerical values. In Pandas, NaN represents missing values that arise from concatenation operations performed on dataframes.

What is a multi-level column index in Pandas?

It is a way to represent data in a dataframe using multiple levels of column labels, meaning each column has sub-columns. Using Pandas, you can perform various operations on dataframes containing multi-level column indices, such as slicingstackingaggregation, etc.

Conclusion

In this article, you learned how to install Pandas and concatenate and reshape dataframes in Pandas with the help of examples. We discussed vertical and horizontal concatenation along with 3 ways of reshaping dataframes - Pivoting, Melting, and Stacking. 

Go through the following articles to learn more about Pandas:-

 

Happy Learning!

Live masterclass