Table of contents
1.
Introduction
2.
What is Pandas?
3.
Pandas in Data Science
4.
Working of Pandas
5.
How to import Pandas as pd
6.
Examples of import Pandas as pd
6.1.
Python
7.
Features of Pandas
8.
How to install Pandas?
9.
Data Structures in Pandas
10.
How to create a Series in Pandas?
10.1.
Code Implementation
10.2.
Python
11.
How to create a Dataframe in Pandas?
11.1.
Syntax of dataframe in Pandas
11.2.
Parameters dataframe in Pandas
11.3.
Code Implementation
11.4.
Python
12.
Some Common Operations using Pandas
13.
Reading CSV file
13.1.
Code Implementation
14.
Selecting Rows and Columns
14.1.
Code Implementation
15.
Grouping and Aggregating Data
15.1.
Code Implementation
16.
Advantages of Pandas
17.
Frequently Asked Questions
17.1.
Why is import pandas as pd used?
17.2.
How do I import pandas locally?
17.3.
How to import pandas in Python script?
17.4.
What does import as PD mean?
18.
Conclusion
Last Updated: Oct 8, 2024
Easy

Import Pandas as pd

Author Avni Gupta
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

When working with data in Python, the pandas library is the main component for data manipulation and analysis. By using import pandas as pd, we can streamline our code with a concise alias for pandas. This practice enhances readability and efficiency, making it easier to utilize pandas' extensive functionality. 

Importing Pandas as pd

In this blog, we will learn what a pandas library is, how to import it, and look at examples that import pandas as pd.

What is Pandas?

Pandas is open-source software written library in Python used for data manipulation and analysis. It provides easy-to-use functions for efficient handling of data by using data structures. Using Pandas, one can manipulate huge numeric tables in no time.

Pandas stands for ‘Python Data Analysis Library,” and it is a very popular and powerful tool for open-source data analysis, which is widely used for Data Science and Machine Learning tasks.

Pandas in Data Science

Pandas is a powerful open-source data manipulation and analysis library for Python. It is widely used in the field of data science for tasks related to data cleaning, exploration, and analysis. The name "Pandas" is derived from the term "Panel Data," which is an econometrics term for multidimensional structured data sets.

Working of Pandas


The working of Pandas involves using its core data structures, primarily the DataFrame and Series, to manipulate and analyze data effectively. Below are the key aspects of how Pandas works:

1. Importing Pandas: To use Pandas, you need to import it into your Python script or Jupyter Notebook.

import pandas as pd

 

2. Data Structures: It offers two data structures DataFrame and Series. DataFrame is a two-dimensional table with labeled axes (rows and columns). It can be thought of as a spreadsheet or SQL table. On the other hand, Series is a one-dimensional labeled array capable of holding any data type.

3. Creating DataFrames and Series: You can create a DataFrame by passing a dictionary of lists or NumPy arrays to the pd.DataFrame() constructor.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
       'Age': [25, 30, 35],
       'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)

 

Creating a Series is similar, but with a single list or array.

series = pd.Series([1, 3, 5, np.nan, 6, 8])

 

4. Data Cleaning: Pandas provides methods for handling missing data, such as dropna() to drop missing values and fillna() to fill missing values.

df.dropna()           # Drop rows with missing values
df.fillna(value=0)    # Fill missing values with a specific value

 

5. Data Selection and Indexing: You can select specific columns or rows using column names or boolean indexing. 

# Selecting a column
df['Name']
# Filtering data based on a condition
df[df['Age'] > 30]

 

6. Data Manipulation: Pandas allows for various operations like arithmetic operations, string operations, and applying functions to data.

# Adding a new column
df['Age_2_years_later'] = df['Age'] + 2

 

7. Grouping and Aggregation: Grouping data using groupby() and applying aggregate functions.

# Grouping by 'City' and calculating the mean age in each city
df.groupby('City')['Age'].mean()

 

8. Merging and Concatenating: Combining multiple DataFrames using merge() or concat().

# Merging two DataFrames based on a common column
pd.merge(df1, df2, on='common_column')
# Concatenating DataFrames vertically
pd.concat([df1, df2])

 

9. Input/Output: Reading and writing data from/to various file formats, such as CSV, Excel, SQL databases.

# Reading from CSV
df = pd.read_csv('data.csv')
# Writing to CSV
df.to_csv('output.csv', index=False)

 

10. Time Series Data: Pandas supports time series analysis with functionalities like date range creation, resampling, and shifting.

# Creating a date range
date_range = pd.date_range('2023-01-01', '2023-12-31', freq='D')

How to import Pandas as pd

Once you have installed Pandas after following the above steps, you can import Pandas as pd. For pandas, we usually import pandas with the pd alias, which means we refer to pandas as pd in the code instead of writing “pandas” each time. 

Alias is an alternate name that can be used for referencing the same thing again and again. We can also import pandas without using an alias, but it is more convenient if we use an alias.

import pandas as pd

Examples of import Pandas as pd

Let us consider an example to under this:

  • Python

Python

import pandas as pd

# Creating a dictionary with sample data
data = {
'Name': ['Rahul', 'Rohit', 'Virat'],
'Age': [25, 30, 18],
'City': ['Vizag', 'Mumbai', 'Delhi']
}

# Creating a DataFrame from the dictionary
df = pd.DataFrame(data)

# Displaying the DataFrame
print("Original DataFrame:")
print(df)

# Adding a new column
df['Playing Style'] = ['Classic', 'Hitman', 'Perfectionist']

# Displaying the DataFrame after adding a new column
print("\nDataFrame with a new column:")
print(df)
You can also try this code with Online Python Compiler
Run Code

Output:

output

Features of Pandas

Some of the key features of pandas are:

  • Data Handling: We can easily manage and explore data using the data structures provided by pandas - Series and Dataframes. It helps us present our data in an organized manner and play with it using various methods.
     
  • Data manipulation: Pandas library provides a range of functions to manipulate data, including filtering, sorting, grouping, joining, merging, and reshaping data.
     
  • Support for file formats: Pandas supports a wide range of file formats and makes it possible for us to manipulate and analyze data from files with different file formats, which greatly increases the speed of processing.
     
  • Data cleaning: Pandas provides functions to handle missing data, remove duplicates, and handle outliers in the data. Sometimes data can be very messy, so pandas help tidy up the data so that it becomes easy to work on.
     
  • Data analysis: Pandas provides several statistical and mathematical functions to perform data analysis, including descriptive statistics, correlation analysis, time series analysis, and regression analysis.
     
  • Data visualization: Pandas integrates with other Python libraries, such as Matplotlib and Seaborn, to create high-quality data visualizations without which the data won't make sense. Pandas make it easy for us to understand all the operations.
     
  • Data Filtering: We can filter the data according to what evaluation we want to perform on it. We can also prevent the repetition of the same data by filtering out the unique data.
     
  • Mathematics: We can apply various functions provided by Pandas for carrying out mathematical operations on our data. We can change the order of the data according to what we want and simplify what we want to do using maths.

How to install Pandas?

For proper installation guide and steps of Pandas, you can look at How to install Pandas in Python.

But summarily, you can install Pandas using pip, the Python package manager. To install pandas, open the terminal/command prompt and run the following command:

pip install pandas

 

This will install the latest version of Pandas on your system.

Data Structures in Pandas

Pandas provides two primary data structures:

  • Series: A one-dimensional labeled array that can hold data of any type (integers, floats, strings, etc.). A Series can be created from a list, array, or dictionary; each element is associated with a unique index.
     
  • DataFrame: A two-dimensional labeled data structure with columns of potentially different types. You can look at a DataFrame as a collection of Series objects, where each column is a Series. You can create DataFrames from various data sources, including CSV files, Excel files, SQL databases, and JSON data.

How to create a Series in Pandas?

To create a series, we can use the series() function of pandas. Take a look at how, along with an example code.

Code Implementation

  • Python

Python

import pandas as pd
import numpy as np
ser = pd.Series()
print(ser)
data = np.array(['n', 'i', 'n', 'j', 'a', 's'])
ser = pd.Series(data)
print(ser)
You can also try this code with Online Python Compiler
Run Code

Output 

Series([], dtype: float64)
0    n
1    i
2    n
3    j
4    a
5    s
dtype: object

 

In the above code we are creating a series ”Ser” and then using an array to fill the series up.

How to create a Dataframe in Pandas?

To create a Dataframe, we can make use of the DataFrame() function of pandas. Let us look at how it works.

Syntax of dataframe in Pandas

pd. DataFrame(data, indexes, columns)

Parameters dataframe in Pandas

  • Data - The data which you need to create the dataframe of.
     
  • Indexes - It defines the label of the row. It starts from 0 and ends at n-1 by default.
     
  • Columns - The name of the columns in the dataframe. If not defined explicitly then willtake up any value between 0 and n-1.

Code Implementation

  • Python

Python

import pandas as pd
df = pd.DataFrame()
print(df)
l = ['n', 'i', 'n', 'j', 'a', 's']
df = pd.DataFrame(l,columns = ['Letters'])
print(df)
You can also try this code with Online Python Compiler
Run Code

Output

Output for the dataframe created

In the above code, we created a Dataframe names df and filled it with the values of the list l which is ['n', 'i', 'n', 'j', 'a', 's'].

Some Common Operations using Pandas

In this section, we will look at some operations we can perform after importing pandas as pd. We will be using a file as well. For the given examples, the file name is just filename.csv, and its contents are:

date,column_name,column_name1,column_name2
2022-01-01,A,10,5
2022-01-01,A,20,6
2022-01-01,B,15,7
2022-01-02,A,12,4
2022-01-02,B,18,6
2022-01-02,B,21,8
2022-01-03,A,9,3
2022-01-03,B,12,5
2022-01-03,B,16,6

 

The CSV file contains sales data for different products (column_name) on different dates (date). Two additional columns (column_name1 and column_name2) contain some additional sales information. The column names are date, column_name, column_name1, and column_name2, corresponding to the column names used in the pandas codes below.

You can copy and paste these columns in a notepad, save it as filename.csv, and use it for the examples below.

Reading CSV file

Let us look at how to read a CSV file, as it is a basic yet important concept used often while working on data analysis.

Code Implementation

import pandas as pd
data = pd.read_csv('filename.csv')
print(data.head())

 

Output

Output for reading a CSV file

Explanation

The code above reads the CSV file using the read_csv() function and prints the first 5 rows of the resulting DataFrame using the head() function. This is a common workflow when working with pandas and CSV files. It can be easily modified to suit different requirements, such as reading different file formats or printing different parts of the DataFrame.

Selecting Rows and Columns

After reading a csv file, how would you select rows and columns for further processing? Let us look at how to do that.

Code Implementation

import pandas as pd
data = pd.read_csv('filename.csv')
subset = data.loc[(data['column_name'] == 'A') , ['column_name1', 'column_name2']]
print(subset.head())

 

Output

Output for selecting rows and columns

Explanation

This code snippet demonstrates how to use pandas to filter a CSV file based on specific criteria and select specific columns.

  • subset = data.loc[(data['column_name'] == 'value'), ['column_name1', 'column_name2']]: This line filters the DataFrame data based on one condition: column_name should have a value of 'value'. The filtered data is then stored in a new DataFrame object named subset, which only contains the columns column_name1 and column_name2.
     
  • print(subset.head()): This line prints the first 5 rows of the DataFrame subset using the head() function.

Grouping and Aggregating Data

Let us look at how we can group the data based on columns together and aggregate values of the data after grouping using the ‘groupby’ and the ‘agg’ functions.

Code Implementation

import pandas as pd
data = pd.read_csv('filename.csv')
grouped_data = data.groupby('column_name').agg({'column_name1': 'sum', 'column_name2': 'mean'})
print(grouped_data.head())

 

Output

Output for grouping and aggregating data

Explanation

This code demonstrates how to group data in a CSV file by a specific column and perform aggregate functions on the grouped data using pandas. Here is a step-by-step breakdown of what is happening :

  • grouped_data = data.groupby('column_name').agg({'column_name1': 'sum', 'column_name2': 'mean'}): This line groups the DataFrame data by a specific column named column_name. The groupby() function in pandas groups the data based on the values in the specified column.
     
  • The agg() function is used to apply aggregate functions to the grouped data. In this case, we are computing the sum of column_name1 and the mean of column_name2 for each group. The resulting data is stored in a new DataFrame object called grouped_data.
     
  • print(grouped_data.head()): This line prints the first 5 rows of the DataFrame grouped_data using the head() function.

Advantages of Pandas

Here are some of the advantages of the Pandas Library.

  • Easy to use: Pandas is a very easy-to-use library and doesn't require much prerequisite knowledge. Only a basic Python coding skill can get you started with Pandas.
     
  • Data Merging: It is very easy to merge data in scenarios where the data is very huge or we have large datasets.
     
  • Efficient Data Structures: As discussed earlier, Pandas uses data structures like Series or ever two dimensional Dataframes which make data analysis and manipulation very easy.
     
  • Data flexibility: Data is very flexible when manipulated using Pandas. Customizing the various files is efficient because of Dataframes as well.
     
  • Less code: Pandas does not require you to code much for even very extensive tasks. Just a few functions provided by it are enough, you need to know when to use what.

Frequently Asked Questions

Why is import pandas as pd used?

It provides a shorthand alias ('pd') for the Pandas library, making code more concise and readable.

How do I import pandas locally?

To import pandas locally, first install it using pip by running pip install pandas in your terminal or command prompt. Then, in your Python script or notebook, import it with import pandas as pd to use the library.

How to import pandas in Python script?

To import pandas in a Python script, use the following line of code: import pandas as pd

This imports the pandas library and allows you to use it with the alias pd in your script.

What does import as PD mean?

import pandas as pd means importing the pandas library and giving it the alias pd. This allows you to refer to pandas functions and objects using pd, making your code shorter and more readable.

Conclusion

Pandas is a versatile library that provides a wide range of tools for data manipulation and analysis, making it an essential tool for data scientists and analysts working with structured data.

Check out the following blogs:

Check out some amazing Guided PathsContests and Interview Experiences only on Code360

Live masterclass