Table of contents
1.
Introduction
1.1.
Pandas Series
1.2.
Pandas Dataframe
2.
Reading Data:
3.
Inspecting Data:
4.
Accessing individual rows and columns
5.
Plotting data with Pandas
6.
Write Pandas Dataframe to a File:
7.
Frequently Asked Questions
8.
Key Takeaways
Last Updated: Mar 27, 2024

Must Know Functions in pandas

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Pandas stands for "Python Data Analysis Library," It is a powerful, fast, flexible, and easy-to-use open-source data analysis & manipulation tool built on top of Python. It is mainly used for data analysis and allows importing data from various file formats. It offers special data structures and operations for manipulating numerical tables and time-series data.

 

 

The Data Structures provided by Pandas are of two distinct types:

  1. Pandas Series
  2. Pandas DataFrame


Before jumping on to these two Data Structures, Let’s take a look at the installation process in case you don’t have pandas installed on your machine.

You must have Python installed on your system for installing Pandas, obviously, as it's a python library. It can be installed either by using pip or the python package installer pip install pandas or by using Conda as Anaconda installs all important libraries for you. So I hope you got pandas installed!

Now let’s continue with the two Data structures as mentioned above.

 

Pandas Series

Pandas series is a 1-dimensional labeled array capable of storing any data type. Pandas series is nothing but a column in an excel sheet, where rows of this column are labeled. The labels need not be unique but must be a hashable type. 

Let's take a look at a simple example and Its comparison with the numpy array.

import pandas as pd
import numpy as np

s1 = pd.Series([10,20,30,40,50])  #series is indexed
s2 = np.array([1,2,3,4,5])

print("s1:")
print(s1)
print("\ns2:")
print(s2)
You can also try this code with Online Python Compiler
Run Code

 

Output:

 

We can also change the index from default numbering by specifically mentioning indexes while defining Series. We can also create pandas series from a dictionary or a list.

# changing indexs
s3 = pd.Series([1,2,3], index=['a','b','c'])
print("s3:")
print(s3)

# We can also create series object from a dictionary
s4 = pd.Series({'a':1, 'b':2, 'c':3})
print("s4:")
print(s4)
You can also try this code with Online Python Compiler
Run Code

 

Output:

 

Elements in Series can be easily accessed by either their position or by using a label if it's defined.

 

Pandas Dataframe

Pandas Dataframe is nothing but a 2-Dimensional labeled, size-mutable, and potentially heterogeneous data structure. It’s like an excel sheet where data is aligned in a tabular manner and provides numerous functionalities to extract, analyze and manipulate data from the given dataset.

Let’s take a look at a simple example on creation of a dataframe.

pd.DataFrame({"Name":['Ritik', 'Suveer', 'Aman'], "Marks":[100,99,99.5]})
You can also try this code with Online Python Compiler
Run Code


Output:

 

Creating Pandas Dataframe from a list:

my_list = [[1,2,3,4],
          [5,6,7,8],
          [9,10,11,12],
          [13,14,15,16],
          [17,18,19,20]]
df = pd.DataFrame(my_list)
You can also try this code with Online Python Compiler
Run Code

 

Output:

 

 

Moving forward, Let's look at some simple functions that can be very helpful when dealing with rows and columns while analyzing data.

 

You can download the dataset from https://www.kaggle.com/imakash3011/customer-personality-analysis?select=marketing_campaign.csv

 

Reading Data:

data = pd.read_csv('marketing_campaign.csv',sep='\t')
# print(data) -will print complete data 

data.head() # gives 5 rows glance of Data
data.head(10) # to see 10 records,  data.tail() will show last records
You can also try this code with Online Python Compiler
Run Code

 

Output: (Cropped image)

 

 

Similarly,  data.tail() will show the last records from the data.

 

We can also read JSON data or Html data by just passing URL in the respective functions.

 

Exjson_data = pd.read_json('URL_of_JSON') and 

html_data = pd.read_html(‘html_url’)

 

Inspecting Data:

-To get column names from our Dataframe:

It's used frequently to confirm a column's spelling while trying to access data by column name.

 

-To get the shape of data (number of rows and columns):

 

 

-To get summary information about Columns, index, Datatype, and memory usage:

data.info(verbose = True)
You can also try this code with Online Python Compiler
Run Code

 

 

-To get summary statistics about the numerical columns in our data, Describe is a very helpful function.

 

We can also access mean, count, min, max, etc. by specifically using the below functions:

 

  • df.mean() Returns the mean of all columns
  • df.count() Returns the number of non-null values in each data frame column
  • df.max() Returns the highest value in each column
  • df.min() Returns the lowest value in each column
  • df.corr() Returns the correlation between columns in a data frame
  • df.median() Returns the median of each column
  • df.std() Returns the standard deviation of each column

 

Accessing individual rows and columns

  • Use of iloc  (index location) for accessing specific rows and columns.
data.iloc[0:3, 0:2] # Rows and Cols to be extracted

# wecan add this data to a new data frame by new_dataframe = data.iloc[0:3, 0:2]
You can also try this code with Online Python Compiler
Run Code

 

Output:

 

  • To output selected rows and columns:

 

 

 

  • Instead of iloc, where we access via index, we can use loc.
# Instead of iloc where we access via index, we can use loc
lesser_data = data.loc[10:21, ("ID", "Year_Birth", "Education")]  #21 included
lesser_data.head(20)
You can also try this code with Online Python Compiler
Run Code

 

Output: 

 

  • Selecting specific records by applying conditions on Data:
graduated_id = lesser_data["Education"] == "Graduation"
print(graduated_id)
You can also try this code with Online Python Compiler
Run Code

 

Output:

lesser_data[(lesser_data["Education"] == "Graduation")]
You can also try this code with Online Python Compiler
Run Code

 

Output:

 

Plotting data with Pandas

lesser_data['Year_Birth'].hist()
You can also try this code with Online Python Compiler
Run Code

 

 

There are numerous functions in Pandas for plotting Data, But there are some more specialized libraries for Data Visualizations. You can refer to Matplotlib and Seaborn's documentations.

 

Write Pandas Dataframe to a File:

Wecan use to_csv() to save DataFrame as a CSV file.

data.to_csv('myDataFrame.csv')
You can also try this code with Online Python Compiler
Run Code


To use a specific character encoding, you can use the encoding:

data.to_csv('myDataFrame.csv', sep='\t', encoding='utf-8')
You can also try this code with Online Python Compiler
Run Code


We can also use to_excel() to write our table to Excel:

writer = pd.ExcelWriter('myDataFrame.xlsx')
data.to_excel(writer, 'DataFrame')
writer.save()
You can also try this code with Online Python Compiler
Run Code

Frequently Asked Questions

  • How can you sort your dataframe by a column?
    Dataframe can be easily sorted by using sort_values function
     
df.sort_values(by='column1')
# or by multiple columns
df.sort_values(by=['column1', 'column2'], ascending=False)
You can also try this code with Online Python Compiler
Run Code

 

  • How can you concatenate two dataframes?
    Let df_1 and df_2 be two dataframes, then they can be concatenated by using pd.concat([ df_1, df_2 ])
     
  • Differentiate between numpy and Pandas.
    Pandas is used when we need to work on Tabular Data, whereas NumPy is preferred for numerical data. Pandas consume more memory than numpy. Indexing of pandas series is very slow as compared to numpy arrays. Pandas have better performance when there are many rows in data.

Key Takeaways

In this blog, We learned about Pandas library, Its data structures, and several important functions which come in handy while analyzing data. Data analysis. We have seen the implementation of these functions with some basic examples; I hope this blog gave you a good enough understanding of Pandas. Please utilize this knowledge and dig deeper into data manipulation. Afterward, you can refer to our articles on Data Visualization Techniques.

Happy Coding!

Live masterclass