Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Pandas is an open-source Python library widely used for data manipulation, analysis, and manipulation tasks. It provides data structures and functions, which makes working with structured data more efficient.
In this article, you will learn about selecting, extracting, and slicing dataframes in Pandas.
Let’s get started.
What is a Data Frame?
A Dataframe is one of the core data structures provided by Pandas. It is a two-dimensional tabular data structure where data is organized into rows and columns. Each column can contain different data types, such as integers, floats, strings, etc.
Before moving forward, let's quickly see how you can install Pandas.
Installing Pandas
Run the following command in your terminal.
Bash
Bash
pip install pandas
Output
We will use the Titanic dataset for all the examples in this article. You can download it from here.
Let’s look at how you can read and store data from CSV files in a dataframe.
Python
Python
import pandas as pd
df = pd.read_csv("./titanic/test.csv")
df.head()
You can also try this code with Online Python Compiler
Here, the read_csv() function reads data from the downloaded CSV file and returns a dataframe containing that data, and the head() function displays a preview of that dataframe.
Selecting Dataframes
Selecting involves extracting specific rows, or columns from a Pandas dataframe. Let’s look at the various methods available in Pandas to select data from dataframes.
Column Selection
You can select one or more columns from a dataframe using their names. You can either use dot notation (df.column_name) or bracket notation (df['column_name']) to access columns. Let’s look at an example.
Python
Python
col_name = df['Name'] # or df.Name print(col_name)
You can also try this code with Online Python Compiler
We selected an entire column from the Titanic dataframe in this example using the column name and bracket notation.
Row Selection
You can select a specific row using the row index. In the following example, we will use the .iloc[] indexer for performing the required integer-based indexing.
Python
Python
selected_row = df.iloc[0] print(selected_row)
You can also try this code with Online Python Compiler
In this example, the filtering was done based on the Age column value.
The following section will teach you about extracting dataframes.
Extracting Dataframes
Extracting involves getting values present at some particular index in our dataframe.
There are two ways of extracting information from dataframes, integer-based and label-based.
Integer-based Extraction
You can use integer-based indexing to select specific values by their integer positions within the dataframe using the .iat[] indexer, which uses the following syntax:-
Here we extracted the value using the row label 0 and column label Fare.
In the next section, you will learn about slicing dataframes in Pandas.
Slicing Dataframes
Slicing is a type of dataframe selection where a subset of contiguous rows and columns are selected from a dataframe. It allows you to extract specific portions of data for further manipulation.
There are two ways of slicing dataframes, integer-based and label-based slicing.
Integer-based Slicing
You can use integer-based indexing to select rows and columns by their integer positions within the dataframe using the .iloc[] indexer, which uses the following syntax:-
Here we selected the columns from Name to Age and rows 5 to 10.
Frequently Asked Questions
What is the difference between a Dataframe and a Series in Pandas?
Both Dataframe and Series are types of data structures provided by the Pandas library. A Dataframe is a two-dimensional data structure that resembles a table, while a Series is a one-dimensional array-like structure.
What parameters can be used with the read_csv() function?
Some commonly used parameters include delimiter, header, usecols, skiprows, nrows, and na_values. These parameters allow you to customize how the CSV file is read and loaded.
What is the difference between selecting and slicing in Pandas?
Selecting involves choosing specific columns and rows from a dataframe, and Slicing refers to extracting contiguous subsets of rows and/or columns from a Dataframe or Series. Both are ways of accessing specific data from a dataframe.
Conclusion
In this article, you learned how to install Pandas and the process of selecting, extracting, and slicing information from dataframes with the help of examples.
Read the following articles to learn more about Pandas:-