Table of contents
1.
Introduction
2.
Pandas read excel
2.1.
Code 
2.2.
Output 
3.
Parameters of pandas read excel
4.
Read Excel Files in Python Using Different Methods
4.1.
Method 1: Reading an Excel File Using Python with Pandas
4.2.
Method 2: Reading an Excel File Using Python with openpyxl
4.3.
Method 3: Reading an Excel File Using Python with Xlwings
5.
Understanding the parameter of panda read excel 
5.1.
io                 
5.1.1.
Code 
5.1.2.
Output
5.1.3.
Explanation
5.2.
sheet_name
5.2.1.
Code
5.2.2.
Output
5.2.3.
Explanation
5.3.
header
5.3.1.
Code
5.3.2.
Output
5.3.3.
Explanation
5.4.
use_col
5.4.1.
Method 1
5.4.2.
Method 2
5.4.3.
Output 1
5.4.4.
Output 2
5.4.5.
Explanation
5.5.
index_col
5.5.1.
Code
5.5.2.
Output
5.5.3.
Explanation
5.6.
dtype
5.6.1.
Code
5.6.2.
Output
5.6.3.
Explanation
5.7.
skiprow
5.7.1.
Code
5.7.2.
Output
5.7.3.
Explanation
5.8.
nrows
5.8.1.
Code
5.8.2.
 Output
5.8.3.
 Explanation
5.9.
skipfooter     
5.9.1.
Code
5.9.2.
Output
5.9.3.
Explanation 
6.
Frequently Asked Question
6.1.
How to read an xls file in pandas?
6.2.
What is the fastest way to read an Excel file in Python?
6.3.
Which is better, pandas or openpyxl?
6.4.
Can pandas read csv read excel?
6.5.
Which Python package read Excel file?
6.6.
What is the difference between read CSV and read Excel?
7.
Conclusion
Last Updated: Dec 1, 2024
Easy

How to Read Excel a File in Python?

Author Muskan Sharma
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Reading Excel files is a common task in data analysis and automation, and Python makes it easy with its versatile libraries. Whether you need to extract data for processing, analyze spreadsheets, or automate reporting, Python provides powerful tools like pandas and openpyxl to simplify these tasks. In this guide, we'll understand the step-by-step process of how we can read Excel file in Python Pandas.

Pandas Read Excel

This article will help you understand the function pandas.read_excel in depth. So, let us dive into the topic and explore it more.

Pandas read excel

Pandas read excel ( pandas.read_excel() ) is a pandas function that reads an excel sheet with the extension xlsx into a pandas DataFrame object. In the Excel sheet, the data is in two-dimensional table form. In the DataFrame object, the data is also in a two-dimensional tabular form.

Let us understand this with the help of an example:


Excel sheet

Example Excel sheet

Code
 

import pandas
r=pandas.read_excel('Student_data.xlsx')
print(r)
You can also try this code with Online Python Compiler
Run Code


Output 

Output example

In the above example, we have an Excel file named Student_data and we print the data into DataFrame objects with the help of pandas.read_excel function.
 

The pandas read excel function has different parameters to read the excel file as per the user's requirement. So, let us look into some of the parameters.

Parameters of pandas read excel

This section of the article will give you a brief overview of the different parameters of the pandas read excel function:

Parameters

Description

ioIt is a valid string path directing to the workbook. The string could be any URL, local path, filename, or file-like object.
sheet_nameIt reads the sheet name of the workbook. A sheet name can only be a string, an integer can be used in zero-indexed sheet positions.
headerIt reads the header row. If there is no header, then use none. The dtype is an object. dtype is nothing but the datatype.
use_colIt parses all the columns if none is mentioned, and it will parse the column of that index that is mentioned. If the list of string or integer is mentioned, it will parse all the columns mentioned.
index_colIt reads the column mentioned in the index_col and will print the column at the index position. It passes None if there is no such column.
dtypeIt tells the type of data present in the file and mentions the type of data present in the column.
skiprowIt skips the row which is mentioned in the skiprow; it takes an integer value.
nrowsIt parses upto the given row mentioned in the nrows parameter.
skipfooterIt skips the number of rows from the bottom of the file which is being parsed and contains this parameter.

Now let us understand these parameters more with the help of examples.

Read Excel Files in Python Using Different Methods

Python offers various libraries to read Excel files, each suited to specific use cases. Before proceeding, ensure you have the necessary libraries installed and an input Excel file ready. Here, we demonstrate how to read Excel files using three popular methods.

Required Modules and Input File

  1. Install the required libraries using pip: 
    pip install pandas openpyxl xlwings
  2. Prepare an input Excel file, for example, sample.xlsx, containing sample data.

Method 1: Reading an Excel File Using Python with Pandas

The pandas library provides a straightforward way to read Excel files into a DataFrame, which is ideal for data manipulation and analysis.

import pandas as pd

# Reading the Excel file
file_path = "sample.xlsx"
df = pd.read_excel(file_path)

# Displaying the content
print("Data from the Excel file:")
print(df)

Output:

When executed, this code reads the content of sample.xlsx and displays it in a tabular format, like this:

Data from the Excel file:
   Name    Age   Department
0  Rohit   30     Finance
1  Rahul   25   Operations
2  Mohit   35       IT

Method 2: Reading an Excel File Using Python with openpyxl

openpyxl is a library designed for interacting with Excel files, supporting both .xlsx and .xlsm formats.

from openpyxl import load_workbook

# Loading the workbook
file_path = "sample.xlsx"
workbook = load_workbook(file_path)

# Selecting the active sheet
sheet = workbook.active

# Reading and displaying data
print("Data from the Excel file:")
for row in sheet.iter_rows(values_only=True):
    print(row)

Output:

This method reads and prints each row from the sample.xlsx file:

Data from the Excel file:
('Name', 'Age', 'Department')
('Rohit', 30, 'Finance')
('Rahul', 25, 'Operations')
('Mohit', 35, 'IT')

Method 3: Reading an Excel File Using Python with Xlwings

Xlwings allows you to interact with Excel using Python, making it ideal for tasks involving live Excel applications.

import xlwings as xw

# Opening the Excel file
file_path = "sample.xlsx"
app = xw.App(visible=False)
workbook = xw.Book(file_path)

# Selecting the first sheet
sheet = workbook.sheets[0]

# Reading and displaying data
data = sheet.range("A1").expand().value
print("Data from the Excel file:")
for row in data:
    print(row)

# Closing the workbook
workbook.close()
app.quit()

Output:

The code fetches and displays data from sample.xlsx dynamically:

Data from the Excel file:
['Name', 'Age', 'Department']
['John', 30, 'Finance']
['Jane', 25, 'Operations']
['Mark', 35, 'IT']

Understanding the parameter of panda read excel 

We have an Excel file having 2 sheets named "Student" and "Article". Using these sheets, we will see examples of the parameters we’ve discussed above.

io                 

It is a valid string path directing to the workbook. The string could be any URL, local path, filename, or file-like object. The path directing to the workbook must be in single quotes [  ' '  ].  

Code
 

import pandas

# printing file data using read_excel

read1=pandas.read_excel('C:/Users/Public/Pandas/Student_data.xlsx')

print(read1)

Output

io example

Explanation

As we see, we are reading the data from the Student_data.xlsx file using read_excel and printing the data into pandas DataFrame object. So in the above example, the valid string path directs to the file "C:/Users/Public/Pandas/Student_data.xlsx " i.e., "io"  in the code. We can see that the Student data is printed, i.e., the first sheet only, so if "sheet_name" is not mentioned, it will print the first sheet only by default.

sheet_name

It reads the sheet name of the workbook. A sheet name can only be a string, and an integer can be used in zero-indexed sheet positions.
 

Code

import pandas

#printing file data with sheet name

read2=pandas.read_excel('Student_data.xlsx',sheet_name='Article')

print(read2)


Output

Output sheet_name


Explanation

The sheet_name will print all the data present in that particular sheet. In the above example, we have used the Article as the sheet name, and then as you can see, all the data of the articles sheet is printed.

header

It reads the header row. If there is no header, then use none. The dtype is an object. dtype is nothing but the datatype.
 

Code

import pandas

read1=pandas.read_excel('Student_data.xlsx')

# To print the heading row

print(read1.columns.ravel())


Output

output (header)


Explanation

"read1.columns.ravel()" is used to get the data of the header row, which is printed using print(). We are using the read1.columns.ravel() to print the header as it will print the column names at index 1, which eventually is the header row.

use_col

It parses all the columns if none is mentioned, and it will parse the column of that index that is mentioned. If the list of string or integer is mentioned, it will parse all the columns mentioned.
 

Method 1

import pandas

#using usecols to prints the given columns by column number

read1=pandas.read_excel('Student_data.xlsx',usecols=[1,2])

print(read1)


Method 2

import pandas

#using usecols to print the given columns, you can use the heading name also

read1=pandas.read_excel('Student_data.xlsx',usecols=['Student ID','Student Name','Student Marks'])

print(read1)

 

Output 1

output 1 usecol


Output 2

output 2 usecol


Explanation

usecol=[ ]  to parse through the specific column and get the desired data. It can be done in two different ways:

  1. Using column number
  2. Using column name

index_col

It reads the column mentioned in the index_col and will print the column at the index position. We’ll pass None if there is no such column.

Code

import pandas

#using student ID as index_col

read1=pandas.read_excel('Student_data.xlsx',index_col='Student ID')

print(read1)


Output

indexcol output

Explanation

It will read the column mentioned in the index_col and will print the column at the index position. As you see in the above example, the "Student ID" has become the index of the data present.

dtype

It tells the type of data present in the file and also mentions the type of data present in the column.


Code

import pandas

read1=pandas.read_excel('Student_data.xlsx')

#using read1.dtypes to get the datatype of the read data

print(read1.dtypes)


Output

dtype output


Explanation

This is used to check the file's data type and each column's data type. Look at the above output; you will see that different columns have different datatypes, and the file's data type is also mentioned as an "object".

skiprow

It skips the row which is mentioned in the skiprow, it takes an integer value.
 

Code

import pandas

#using the row number to skip the row

read1=pandas.read_excel('Student_data.xlsx',skiprows=[1])

print(read1) 


Output

skiprow output


Explanation

This will skip the parsing of the data of the row mentioned; like in the above code, row 1 is skipped as the "skiprow=1".  

nrows

It will parse upto the given row mentioned in the nrows parameter.
 

Code

import pandas

# Use the number of rows to print only desired number of rows

read1=pandas.read_excel('Student_data.xlsx',nrows=3)

print(read1)

 
Output

nrows output

 
Explanation

This will parse into the desired number of rows only, as you can see in the above code "nrow=3"so it parses up to three rows.

skipfooter     

It is used to skip the number of rows from the bottom of the file which is being parsed and contains this parameter.
 

Code

import pandas

# Use the number of rows to skip the row from the last using skipfooter

read1=pandas.read_excel('Student_data.xlsx',skipfooter=1)

print(read1)


Output

skipfooter output


Explanation 

skipfooter is used to skip the data at the bottom of the page or the row present at the bottom. In the above example, the last row is skipped.

Frequently Asked Question

How to read an xls file in pandas?

Use the read_excel() function in pandas with the engine='xlrd' parameter for .xls files:

pd.read_excel('file.xls', engine='xlrd')  

What is the fastest way to read an Excel file in Python?

Using pandas with read_excel() and specifying engine='openpyxl' for .xlsx files is generally the fastest for most use cases.

Which is better, pandas or openpyxl?

Pandas is better for data analysis and manipulation, while openpyxl is preferred for advanced Excel file operations like formatting or writing macros.

Can pandas read csv read excel?

The pandas library can read data from CSV or excel files. The pandas function that can be used is read_csv() or read_excel(). The file path can only contain a string and must include the file name and extension.

Which Python package read Excel file?

Openpyxl is a tool that lets you read from or write to Excel files using Python. Data scientists use it to analyze and manipulate data, copy information, mine for insights, make charts, format sheets, add formulas, and automate tasks for data processing.

What is the difference between read CSV and read Excel?

CSV is a plain text format with a series of values separated by commas, whereas Excel is a binary file containing data for all worksheets in a workbook. CSV files are unable to perform data operations, but Excel can.

Conclusion

In this article, we learned the pandas library in Python, the pandas read excel function, and the different parameters of the pandas read excel function. We hope this article helped you in knowing more about the pandas read excel.

If you want to learn more, refer to these articles:

Live masterclass