
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Resample Time Series Data in Python
Time series data is a sequence of observations collected over time at regular intervals. This data can be of any domain such as finance, economics, health, and environmental science. The time series data we collect can sometimes be of different frequencies or resolutions, which may not be suitable for our analysis and data modeling process. In such cases, we can Resample our time series data by changing the frequencies or resolution of the time series by either upsampling or downsampling. This article will explain different methods to upsample or downsample the time series data.
Upsampling
Upsampling means increasing the frequency of the time series data. This is usually done when we need a higher resolution or more frequent observations. Python provides several methods for upsampling time series data, including linear interpolation, nearest neighbor interpolation, and polynomial interpolation.
Syntax
DataFrame.resample(rule, *args, **kwargs) DataFrame.asfreq(freq, method=None) DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None)
Here,
The resample function is a method provided by the pandas library to resample time series data. It is applied on a DataFrame and takes the rule parameter, which specifies the desired frequency for resampling. Additional arguments (*args) and keyword arguments (**kwargs) can be provided to customize the resampling behavior, such as specifying the aggregation method or handling missing values.
The asfreq method is used in conjunction with the resample function to convert the frequency of the time series data. It takes the freq parameter, which specifies the desired frequency string for the output. The optional method parameter allows specifying how to handle any missing values introduced during the resampling process, such as forward filling, backward filling, or interpolation.
The interpolate method is used to fill missing values or gaps in the time series data. It performs interpolation based on the specified method (e.g., 'linear', 'nearest', 'spline') to estimate the values between existing observations. Additional parameters allow controlling the axis along which interpolation is performed, the limit on consecutive NaN values to be filled, and whether to modify the DataFrame inplace or return a new DataFrame.
Linear Interpolation
Linear interpolation is used for upsampling time series data. It fills the gaps between data points by drawing straight lines between them. The resample function from the pandas library can be used to achieve linear interpolation.
Example
In the below example, we have a time series DataFrame with three observations on non?consecutive dates. We convert the 'Date' column to a datetime format and set it as the index. The resample function is used to upsample the data to a daily frequency ('D') using the asfreq method. Finally, the interpolate method with the 'linear' option fills the gaps between the data points using linear interpolation. The DataFrame, df_upsampled, contains the upsampled time series data with interpolated values.
import pandas as pd # Create a sample time series DataFrame data = {'Date': ['2023-06-01', '2023-06-03', '2023-06-06'], 'Value': [10, 20, 30]} df = pd.DataFrame(data) df['Date'] = pd.to_datetime(df['Date']) df.set_index('Date', inplace=True) # Upsample the data using linear interpolation df_upsampled = df.resample('D').asfreq().interpolate(method='linear') # Print the upsampled DataFrame print(df_upsampled)
Output
Value Date 2023-06-01 10.000000 2023-06-02 15.000000 2023-06-03 20.000000 2023-06-04 23.333333 2023-06-05 26.666667 2023-06-06 30.000000
Nearest Neighbor Interpolation
Nearest neighbor interpolation is a simple method that fills the gaps between data points with the nearest available observation. This method can be useful when the time series exhibits abrupt changes or when the order of observations matters. The interpolate method in pandas can be used with the 'nearest' option to perform nearest neighbor interpolation.
Example
In the above example, we use the same original DataFrame as before. After resampling with the 'D' frequency, the interpolate method with the 'nearest' option fills the gaps by copying the nearest available observation. The resulting DataFrame, df_upsampled, now has a daily frequency with the nearest neighbor interpolation.
import pandas as pd # Create a sample time series DataFrame data = {'Date': ['2023-06-01', '2023-06-03', '2023-06-06'], 'Value': [10, 20, 30]} df = pd.DataFrame(data) df['Date'] = pd.to_datetime(df['Date']) df.set_index('Date', inplace=True) # Upsample the data using nearest neighbor interpolation df_upsampled = df.resample('D').asfreq().interpolate(method='nearest') # Print the upsampled DataFrame print(df_upsampled)
Output
Value Date 2023-06-01 10.0 2023-06-02 10.0 2023-06-03 20.0 2023-06-04 20.0 2023-06-05 30.0 2023-06-06 30.0
Down Sampling
Downsampling is used to decrease the frequency of the time series data, usually to obtain a broader view of the data or to simplify analysis. Python provides different downsampling techniques, such as taking the mean, sum, or maximum value within a specified time interval.
Syntax
DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Here, Aggregation methods, such as mean, sum, or max, are applied after resampling to calculate a single value representing the grouped observations within each resampled interval. These methods are typically used when downsampling the data. They can be applied directly to the resampled DataFrame or combined with the resample function to aggregate the data based on a specific frequency, such as weekly or monthly, by specifying the appropriate rule.
Mean Downsampling
Mean downsampling calculates the average value of the data points within each interval. This method is useful when dealing with high?frequency data and obtaining a representative value for each interval. The resample function combined with the mean method can be used to perform mean downsampling.
Example
In the below example, we start with a daily time series DataFrame spanning the entire month of June 2023. The resample function with the 'W' frequency downsamples the data to weekly intervals. By applying the mean method, we obtain the average value within each week. The resulting DataFrame, df_downsampled, contains the mean-downsampled time series data.
import pandas as pd # Create a sample time series DataFrame with daily frequency data = {'Date': pd.date_range(start='2023-06-01', end='2023-06-30', freq='D'), 'Value': range(30)} df = pd.DataFrame(data) df.set_index('Date', inplace=True) # Downsampling using mean df_downsampled = df.resample('W').mean() # Print the downsampled DataFrame print(df_downsampled)
Output
Value Date 2023-06-04 1.5 2023-06-11 7.0 2023-06-18 14.0 2023-06-25 21.0 2023-07-02 27.0
Maximum Downsampling
Maximum downsampling calculates and sets the highest value within each interval. This method is suitable for identifying peak values or extreme events in the time series. Using max instead of mean or sum in the previous example allows us to perform maximum downsampling.
Example
In the below example, we start with a daily time series DataFrame spanning the entire month of June 2023. The resample function with the 'W' frequency downsamples the data to weekly intervals. By applying the max method, we obtain the Maximum value within each week. The resulting DataFrame, df_downsampled, contains the maximum-downsampled time series data.
import pandas as pd # Create a sample time series DataFrame with daily frequency data = {'Date': pd.date_range(start='2023-06-01', end='2023-06-30', freq='D'), 'Value': range(30)} df = pd.DataFrame(data) df.set_index('Date', inplace=True) # Downsampling using mean df_downsampled = df.resample('W').max() # Print the downsampled DataFrame print(df_downsampled)
Output
Value Date 2023-06-04 3 2023-06-11 10 2023-06-18 17 2023-06-25 24 2023-07-02 29
Conclusion
In this article, we discussed how we can resample time series data using Python. Python provides various upsampling and downsampling techniques. We explored linear and nearest?neighbor interpolation for upsampling and mean and maximum interpolation for downsampling. You can use any of the upsampling or downsampling technique depending on the problem at hand.