Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
In Pandas, Time series and Timedeltas are two very important data structures to work with time-related data. Time series data holds immense importance across diverse fields, from finance to healthcare. At the same time, the concept of timedelta plays an important role in managing time durations.
In this article, we will be looking into how to use Pandas in order to work with time series data and timedelta objects.
What is the Time Series?
A time series is a sequence of data points that occur in sequential order over a given period of time. The data points in the series could be numeric, categorical, or mixed. Time series data helps us to find a pattern or trends over a period of time.
Time series found its importance in problems such as the price of a stock, the number of website visitors, or the weather conditions.
Creating a Time series index
DatetimeIndex is used to create a timestamp index for time series data. The pd.date_range() function generates a sequence of timestamps based on a specified frequency.
In the above command, pd.date_range() function is used to generate a range of dates. Then we create a DataFrame named df with a column 'value', which contains corresponding random values for each date in the range.
Selecting Data for a Specific Date
In this section, we will show, through the code, how to select data for a specific date. For this purpose, we will set selected_date to '2023-01-05' and use .loc[selected_date] to access the row corresponding to that date.
The output displays the data for the selected date.
Filtering Data Within a Date Range
Let us see how to filter data within a specific date range. We will be using slicing with the DataFrame, and we will retrieve rows that fall within the range from '2023-01-03' to '2023-01-08'.
Python
Python
# Filtering data within a date range
date_range_filter = df['2023-01-03':'2023-01-08']
print("\nData within date range:\n", date_range_filter)
You can also try this code with Online Python Compiler
The output shows the data that falls within this date range.
Resampling Data
Let us see the resampling of the data to a lower frequency. Using the below code, we will resample data to weekly frequency by using .resample('W') and will calculate the mean for each week.
Python
Python
# Resampling data to weekly frequency and calculating mean
The output displays the weekly resampled data along with the mean values.
Aggregating Data for the Entire Period
In this section, we will perform aggregation operations on the data for the entire time period. We will calculate the total sum and average value of the 'value' column using .sum() and .mean() functions.
Python
Python
# Aggregating data for the entire period
total_sum = df['value'].sum()
average_value = df['value'].mean()
print("\nTotal sum of values:", total_sum)
print("Average value:", average_value)
You can also try this code with Online Python Compiler
The output displays the total sum and average value of the 'value' column across the entire dataset.
What is Timedelta?
A timedelta is nothing but refers to a duration or can, say, the difference between two times or dates. Timedelta is used to represent a period of time, such as one day, four hours, or five seconds and they can be positive or negative.
Timedeltas are immutable, which means that they cannot be changed once they are created.
Syntax
The syntax for creating pandas.timedelta object is:
class pandas.Timedelta(value=<object object>, unit=None, **kwargs)
It contains three parameters, namely value, unit and kwargs. Let us understand them one by one.
Parameters
Let us look at each parameter:
Value: It can be a number, a string, or another timedelta object, and It is mandatory to pass it
Unit: it represents units of duration and can be days, seconds, hours etc
**kwargs: used to specify additional properties of the timedelta object, such as tzinfo, which is the time zone of the timedelta object
Example
Let us see an example of how to use pandas.timedelta class:
In the code below, we import the panda's library and numpy for necessary operations, then create a timedelta object in which td2 represents 2 days, 5 hours, and 13 minutes.
In the above code, we saw how to initialise the Timedelta object and print the output.
Timedeltas Arithmetic Operations
Arithmetic operations allow us to manipulate time durations, calculate past dates, and compare time spans. Arithmetic operations like addition and subtraction can be done using timedelta and datetime objects.
Let us see an example to understand it better; we will be performing addition and subtraction:
Python
Python
from datetime import datetime, timedelta
# Define a base date
base_date = datetime(2023, 8, 1)
# Define a timedelta
delta = timedelta(days=10)
# Adding timedelta to the base date
new_date = base_date + delta
print("New date:", new_date)
# Subtracting timedelta from the base date
previous_date = base_date - delta
print("Previous date:", previous_date)
You can also try this code with Online Python Compiler
Adding the timedelta to the base date results in a new date that is 10 days into the future
Subtracting the timedelta from the base date gives a previous date that is 10 days in the past
Timedelta.seconds Property
The Timedelta.seconds property is used to return the total number of seconds within a timedelta object. With the help of this property, we can extract the seconds component from a timedelta duration.
It also does not need any special parameters, and its return type is in the form of seconds.
Consider the example below:
Python
Python
from datetime import timedelta
import pandas as pd
# Creating timedelta objects
delta1 = timedelta(days=5, hours=3, minutes=30)
delta2 = pd.Timedelta('2 days 4 hours 15 minutes 30 seconds')
# Accessing the seconds component using Timedelta.seconds
seconds1 = delta1.seconds
seconds2 = delta2.seconds
print("Seconds in delta1:", seconds1)
print("Seconds in delta2:", seconds2)
You can also try this code with Online Python Compiler
In the above example, delta1 represents a timedelta of 5 days and 3 hours, and delta2 represents a timedelta of 2 days, 4 hours, 15 minutes, and 30 seconds.
By using the seconds property, we extract the total number of seconds from each timedelta and print them in the output.
Output:
Frequently Asked Questions
Is Time-based indexing important in time series data?
Yes, Time-based indexing is important in time series data because it allows easy retrieval of data within specific intervals of time and makes analysis effective.
How to create a timedelta object in Pandas?
To create a timedelta object, we can use the timedelta constructor or by parsing strings with a specified time duration format.
How is missing data handled during resampling of time series data?
Pandas provides us with various options to handle missing data during resampling. We have methods such as aggregation functions like mean or forward filling to maintain data integrity.
Conclusion
Congratulations, you did a fantastic job!!. This article is a comprehensive guide to Time Series and Timedelta in Pandas. We have discussed four basic operations of time series and covered timedelta with working examples. At last, some frequently asked questions are discussed.