Table of contents
1.
Introduction
2.
What is a Data Frame?
2.1.
Installing Pandas
2.2.
Bash
2.3.
Python
3.
Selecting Dataframes
3.1.
Column Selection
3.2.
Python
3.3.
Row Selection
3.4.
Python
3.5.
Boolean Indexing
3.6.
Python
3.7.
Python
4.
Extracting Dataframes
4.1.
Integer-based Extraction
4.2.
Syntax
4.3.
Python
4.4.
Label-based Extraction
4.5.
Syntax
4.6.
Python
5.
Slicing Dataframes
5.1.
Integer-based Slicing
5.2.
Syntax
5.3.
Python
5.4.
Python
5.5.
Label-based Slicing
5.6.
Syntax
5.7.
Python
6.
Frequently Asked Questions
6.1.
What is the difference between a Dataframe and a Series in Pandas?
6.2.
What parameters can be used with the read_csv() function?
6.3.
What is the difference between selecting and slicing in Pandas?
7.
Conclusion
Last Updated: Mar 27, 2024

Selecting, Extracting and Slicing Dataframes Pandas

Author Abhinav Anand
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Pandas is an open-source Python library widely used for data manipulation, analysis, and manipulation tasks. It provides data structures and functions, which makes working with structured data more efficient.

selecting, extraction and slicing dataframes

In this article, you will learn about selecting, extracting, and slicing dataframes in Pandas.

Let’s get started.

What is a Data Frame?

A Dataframe is one of the core data structures provided by Pandas. It is a two-dimensional tabular data structure where data is organized into rows and columns. Each column can contain different data types, such as integers, floats, strings, etc.

Before moving forward, let's quickly see how you can install Pandas.

Installing Pandas

Run the following command in your terminal.

  • Bash

Bash

pip install pandas

 

Output

output

We will use the Titanic dataset for all the examples in this article. You can download it from here.

Let’s look at how you can read and store data from CSV files in a dataframe.

  • Python

Python

import pandas as pd

df = pd.read_csv("./titanic/test.csv")

df.head()
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here, the read_csv() function reads data from the downloaded CSV file and returns a dataframe containing that data, and the head() function displays a preview of that dataframe.

Selecting Dataframes

Selecting involves extracting specific rows, or columns from a Pandas dataframe. Let’s look at the various methods available in Pandas to select data from dataframes.

Column Selection

You can select one or more columns from a dataframe using their names. You can either use dot notation (df.column_name) or bracket notation (df['column_name']) to access columns. Let’s look at an example.

  • Python

Python

col_name = df['Name'] # or df.Name
print(col_name)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

We selected an entire column from the Titanic dataframe in this example using the column name and bracket notation.

Row Selection

You can select a specific row using the row index. In the following example, we will use the .iloc[] indexer for performing the required integer-based indexing.

  • Python

Python

selected_row = df.iloc[0]
print(selected_row)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Boolean Indexing

You can use boolean conditions to filter rows based on specific criteria. Let us take a look at an example to understand this better.

  • Python

Python

filtered = df[df['Name']=='Kelly, Mr. James']
print(filtered)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here we filtered the rows based on the Name column value.

  • Python

Python

filtered = df[df['Age']<10]
filtered.head()
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

In this example, the filtering was done based on the Age column value.

The following section will teach you about extracting dataframes.

Extracting Dataframes

Extracting involves getting values present at some particular index in our dataframe.

There are two ways of extracting information from dataframes, integer-based and label-based.

Integer-based Extraction

You can use integer-based indexing to select specific values by their integer positions within the dataframe using the .iat[] indexer, which uses the following syntax:-

  • Syntax

Syntax

dataframe.iat[row_index, col_index]

 

Let’s take a look at an example.

  • Python

Python

extracted_value = df.iat[0, 2]
print(extracted_value)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here we extracted the value present at the (0, 2) index from our dataframe table.

Label-based Extraction

You can use label-based indexing to select values by their labels using .at[] indexer, which uses the following syntax:-

  • Syntax

Syntax

dataframe.at[row_label, col_label]

 

Row indices serve as row labels, and column names serve as column labels.

Now, let’s take a look at an example.

  • Python

Python

extracted_value = df.at[0, "Fare"]
print(extracted_value)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here we extracted the value using the row label 0 and column label Fare.

In the next section, you will learn about slicing dataframes in Pandas.

Slicing Dataframes

Slicing is a type of dataframe selection where a subset of contiguous rows and columns are selected from a dataframe. It allows you to extract specific portions of data for further manipulation.

There are two ways of slicing dataframes, integer-based and label-based slicing.

Integer-based Slicing

You can use integer-based indexing to select rows and columns by their integer positions within the dataframe using the .iloc[] indexer, which uses the following syntax:-

  • Syntax

Syntax

dataframe.iloc[row_start:row_end, col_start:col_end]

 

Let’s take a look at some examples.

  • Python

Python

sliced_df = df.iloc[0:3, 0:2]  # Rows 0 to 2 and Columns 0 to 1
print(sliced_df)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here we selected rows [0, 2] and columns [0, 1]. The next example will cover selecting contiguous columns using the .iloc[] indexer.

  • Python

Python

sliced_df = df.iloc[:, 0:2]  # Columns 0 to 1
print(sliced_df)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

As we only specified the column indices, the result contains those columns and all the rows.

Label-based Slicing

You can use label-based indexing to select rows and columns by their labels using .loc[] indexer, which uses the following syntax:-

  • Syntax

Syntax

dataframe.loc[row_label_start:row_label_end, col_label_start:col_label_end]

 

Now, let’s take a look at an example.

  • Python

Python

sliced_df = df.loc[5:10, "Name":"Age"]
print(sliced_df)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Here we selected the columns from Name to Age and rows 5 to 10.

Frequently Asked Questions

What is the difference between a Dataframe and a Series in Pandas?

Both Dataframe and Series are types of data structures provided by the Pandas library.  A Dataframe is a two-dimensional data structure that resembles a table, while a Series is a one-dimensional array-like structure.

What parameters can be used with the read_csv() function?

Some commonly used parameters include delimiterheaderusecolsskiprowsnrows, and na_values. These parameters allow you to customize how the CSV file is read and loaded.

What is the difference between selecting and slicing in Pandas?

Selecting involves choosing specific columns and rows from a dataframe, and Slicing refers to extracting contiguous subsets of rows and/or columns from a Dataframe or Series. Both are ways of accessing specific data from a dataframe.

Conclusion

In this article, you learned how to install Pandas and the process of selectingextracting, and slicing information from dataframes with the help of examples.

Read the following articles to learn more about Pandas:-

 

Happy Learning!

Live masterclass