Table of contents
1.
Introduction
2.
Reading a CSV File
2.1.
Using the csv module
3.
Using the pandas library
4.
Reading a CSV File Format in Python
4.1.
Using csv.DictReader() class
5.
Using pandas.read_csv() method
6.
Using csv.DictReader() class
7.
Using pandas.read_csv() method
8.
Frequently Asked Questions
8.1.
What is the difference between csv.reader() and csv.DictReader()?
8.2.
How do you handle missing values when reading a CSV file using pandas?
8.3.
Can you read a CSV file from a URL using pandas?
9.
Conclusion
Last Updated: Jun 20, 2024
Easy

Python Read CSV File

Author Rinki Deka
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

CSV (Comma-Separated Values) is a simple file format used to store & organize data in a tabular structure. It's a plain text file where each line represents a row & columns are separated by commas. CSV files are widely used for data exchange between different applications. Python provides built-in modules & libraries to read, write & manipulate CSV files easily. 

Python Read CSV File

In this article, we'll discuss different ways to read CSV files using Python, including the csv module & pandas library. 

Reading a CSV File

To read a CSV file in Python, you can use the built-in csv module or the pandas library. Here's how you can read a CSV file using these methods:

Using the csv module

The csv module provides functionality to read & write CSV files. Here's an example of how to read a CSV file using the csv.reader() function:

import csv
with open('example.csv', 'r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        print(row)


In this code, we first import the csv module. Then, we open the CSV file named 'example.csv' in read mode using the open() function. We create a csv.reader object by passing the file object to csv.reader(). Finally, we iterate over each row in the CSV file using a for loop & print each row.

Using the pandas library

The pandas library is a powerful data manipulation & analysis library for Python. It provides a convenient way to read CSV files using the read_csv() function. Here's an example:

import pandas as pd
data = pd.read_csv('example.csv')
print(data)


In this code, we import the pandas library as pd. We use the pd.read_csv() function to read the CSV file named 'example.csv'. The read_csv() function returns a DataFrame object, which is a two-dimensional labeled data structure. Finally, we print the DataFrame to see the contents of the CSV file.

Reading a CSV File Format in Python

Python provides various ways to read a CSV file, depending on your requirements. Let's explore a few commonly used methods:

Using csv.reader():


The csv.reader() function is part of the csv module & allows you to read a CSV file row by row. Here's an example:

pythonCopyimport csv
with open('example.csv', 'r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        print(row)


In this code, we open the CSV file using the open() function in read mode ('r'). We create a csv.reader object by passing the file object to csv.reader(). Then, we iterate over each row in the CSV file using a for loop & print each row.

The csv.reader() function treats each row as a list of strings, where each element represents a column value. This method is useful when you want to process the CSV data row by row.

Using csv.DictReader() class

The csv.DictReader() class is another way to read a CSV file using the csv module. It treats the first row of the CSV file as fieldnames & creates a dictionary for each subsequent row. 

For example : 

import csv
with open('example.csv', 'r') as file:
    csv_reader = csv.DictReader(file)
    for row in csv_reader:
        print(row['Column1'], row['Column2'])


In this code, we create a csv.DictReader object by passing the file object to csv.DictReader(). 

Each row in the CSV file is treated as a dictionary, where the keys are the fieldnames from the first row. We can access the values of specific columns using the column names as keys.

This method is useful when you want to access the CSV data using column names instead of indices.

Using pandas.read_csv() method

The pandas library provides a powerful & convenient way to read CSV files using the read_csv() function. Here's an example:

import pandas as pd
data = pd.read_csv('example.csv')
print(data)


In this code, we use the pd.read_csv() function to read the CSV file. The function returns a DataFrame object, which is a two-dimensional labeled data structure. We can directly access columns using their names & perform various data manipulation operations using pandas.

Note: The pandas library is highly recommended for working with CSV files, especially when you have large datasets or need advanced data analysis capabilities.

Using csv.reader():


The csv.reader() function is a part of the csv module in Python & provides a convenient way to read CSV files. It allows you to iterate over the rows of a CSV file & access the values in each row as a list of strings. Here's a detailed explanation of how to use csv.reader() with an example:

import csv
with open('example.csv', 'r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        print(row)


In this code : 

  1. We start by importing the csv module, which provides functionality for working with CSV files.
     
  2. We use the open() function to open the CSV file named 'example.csv' in read mode ('r'). The with statement ensures that the file is properly closed after we're done reading from it.
     
  3. We create a csv.reader object by passing the file object to csv.reader(). This object allows us to iterate over the rows of the CSV file.
     
  4. We use a for loop to iterate over each row in the csv_reader object. Each row is returned as a list of strings, where each element represents a column value.
     
  5. Inside the loop, we print each row to display the contents of the CSV file.


Here's an example CSV file named 'example.csv':

Name,Age,City
Sanjana,22,New York
Ravi,23,London
Gaurav,25,Paris


When we run the code with this CSV file, the output will be:

['Name', 'Age', 'City']
['Sanjana', '22', 'New York']
['Ravi', '23', 'London']
['Gaurav', '25', 'Paris']


As you can see, each row is printed as a list of strings. The first row represents the header or column names, & the subsequent rows contain the actual data.

You can further process the data by accessing individual elements of each row using indexing. For example, row[0] would give you the value in the first column of the current row.

Note:  csv.reader() is a simple but effective way to read CSV files in Python when you want to process the data row by row & doesn't require advanced functionality like accessing columns by their names.

Using csv.DictReader() class

The csv.DictReader() class is another way to read a CSV file using the csv module in Python. It provides a convenient way to access the CSV data using column names instead of indices. The first row of the CSV file is treated as the fieldnames or column headers, and each subsequent row is returned as a dictionary with keys corresponding to the fieldnames. 

Here's a detailed explanation of how to use csv.DictReader() with an example:

import csv
with open('example.csv', 'r') as file:
    csv_reader = csv.DictReader(file)
    for row in csv_reader:
        print(row['Name'], row['Age'], row['City'])


In this code : 

  1. We start by importing the csv module.
     
  2. We use the open() function to open the CSV file named 'example.csv' in read mode ('r') using the with statement.
     
  3. We create a csv.DictReader object by passing the file object to csv.DictReader(). This object treats the first row of the CSV file as the fieldnames.
     
  4. We use a for loop to iterate over each row in the csv_reader object. Each row is returned as a dictionary, where the keys are the fieldnames from the first row of the CSV file.
     
  5. Inside the loop, we can access the values of specific columns using the column names as keys. In this example, we print the values of the 'Name', 'Age', and 'City' columns for each row.


Using the same example CSV file as before:

Name,Age,City
Sanjana,22,New York
Ravi,23,London
Gaurav,25,Paris


When we run the code with this CSV file, the output will be:

Sanjana 22 New York
Ravi 23 London
Gaurav 25 Paris


As you can see, we can directly access the values of specific columns using their corresponding column names as keys in the row dictionary.

The csv.DictReader() class provides a more intuitive way to work with CSV data when you have known column names and want to access the values using those names instead of indices.

One thing to note is that the fieldnames are case-sensitive, so make sure to use the correct column names when accessing the values.
Note: csv.DictReader() is particularly useful when you have a large number of columns in your CSV file, and accessing them by indices becomes cumbersome or error-prone.

Using pandas.read_csv() method

The pandas library in Python provides a powerful and convenient way to read CSV files using the read_csv() function. It offers a wide range of features and flexibility for working with CSV data. Here's a detailed explanation of how to use pandas.read_csv() with an example:

import pandas as pd
data = pd.read_csv('example.csv')
print(data)


In this code : 

  1. We start by importing the pandas library and conventionally alias it as pd.
     
  2. We use the pd.read_csv() function to read the CSV file named 'example.csv'. The function returns a DataFrame object, which is a two-dimensional labeled data structure.
     
  3. We assign the resulting DataFrame to the variable data.
     
  4. Finally, we print the DataFrame using print(data) to display the contents of the CSV file.


Using the same example CSV file as before:

Name,Age,City
Sanjana,22,New York
Ravi,23,London
Gaurav,25,Paris


When we run the code with this CSV file, the output will be:

  Name       Age      City
0 Sanjana   22      New York
1 Ravi         23      London
2  Gaurav   25       Paris


The DataFrame displays the data in a tabular format, with the column names as headers and the rows representing the data records.

The pandas library provides numerous advantages when working with CSV files:

  1. Automatic data type inference: pandas automatically infers the data types of the columns based on the values in the CSV file.
     
  2. Easy data manipulation: pandas provides a wide range of functions and methods to manipulate and analyze the data, such as filtering, sorting, grouping, and aggregating.
     
  3. Handling missing data: pandas has built-in support for handling missing or null values in the CSV file.
     
  4. Integration with other libraries: pandas seamlessly integrates with other data analysis and visualization libraries, such as NumPy and Matplotlib.
     

Here are a few examples of data manipulation using pandas:

  1. Accessing columns: You can access individual columns of the DataFrame using the column names. For example, data['Name'] would give you the 'Name' column as a Series.
     
  2. Filtering data: You can filter the DataFrame based on conditions. For example, data[data['Age'] > 30] would return a new DataFrame containing only the rows where the 'Age' column is greater than 30.
     
  3. Sorting data: You can sort the DataFrame based on one or more columns. For example, data.sort_values('Age') would sort the DataFrame based on the 'Age' column in ascending order.
     

Note: pandas.read_csv() is highly recommended when working with CSV files, especially when you have large datasets or need advanced data manipulation and analysis capabilities.

Frequently Asked Questions

What is the difference between csv.reader() and csv.DictReader()?

csv.reader() returns each row of the CSV file as a list of strings, where each element represents a column value. You access the values using indices. On the other hand, csv.DictReader() treats the first row as fieldnames and returns each subsequent row as a dictionary, where the keys are the fieldnames and the values are the corresponding column values. This allows you to access the values using the column names as keys.

How do you handle missing values when reading a CSV file using pandas?

When reading a CSV file using pandas.read_csv(), missing values are automatically handled and represented as NaN (Not a Number) by default. You can specify a different missing value indicator using the na_values parameter. Additionally, pandas provides functions like fillna(), dropna(), and isnull() to handle and manipulate missing values in the DataFrame.

Can you read a CSV file from a URL using pandas?

Yes, pandas allows you to read a CSV file directly from a URL using the read_csv() function. Instead of providing a file path, you can pass the URL as a string to read_csv(). For example, pd.read_csv('https://example.com/data.csv') would read the CSV file from the specified URL. Make sure you have an active internet connection to fetch the file from the URL.

Conclusion

In this article, we learned about different ways to read CSV files using Python. We discussed the csv module, which provides the csv.reader() function for reading CSV files row by row and the csv.DictReader() class for accessing CSV data using column names. We also learned about the pandas library and its read_csv() function, which offers powerful data manipulation and analysis capabilities when working with CSV files.

You can refer to our guided paths on Code 360. You can check our course to learn more about DSADBMSCompetitive ProgrammingPythonJavaJavaScript, etc. Also, check out some of the Guided Paths on topics such as Data Structure andAlgorithmsCompetitive ProgrammingOperating SystemsComputer Networks, DBMSSystem Design, etc., as well as some Contests, Test Series, and Interview Experiences curated by top Industry.

Live masterclass