Python | Pandas dataframe.resample()
Last Updated :
22 Oct, 2019
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages.
Pandas is one of those packages and makes importing and analyzing data much easier.
Pandas
dataframe.resample()
function is primarily used for time series data.
A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. It is a Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
Syntax : DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None)
Parameters :
rule : the offset string or object representing target conversion
axis : int, optional, default 0
closed : {‘right’, ‘left’}
label : {‘right’, ‘left’}
convention : For PeriodIndex only, controls whether to use the start or end of rule
loffset : Adjust the resampled time labels
base : For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0.
on : For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.
level : For a MultiIndex, level (name or number) to use for resampling. Level must be datetime-like.
Resampling generates a unique sampling distribution on the basis of the actual data. We can apply various frequency to resample our time series data. This is a very important technique in the field of analytics.
Most commonly used time series frequency are -
W : weekly frequency
M : month end frequency
SM : semi-month end frequency (15th and end of month)
Q : quarter end frequency
There are many other types of time series frequency available. Let's see how to apply these time series frequency on data and resample it.
For link to CSV file Used in Code, click
here
This is a stock price data of Apple for a duration of 1 year from (13-11-17) to (13-11-18)
Example #1: Resampling the data on monthly frequency
Python3
# importing pandas as pd
import pandas as pd
# By default the "date" column was in string format,
# we need to convert it into date-time format
# parse_dates =["date"], converts the "date"
# column to date-time format. We know that
# resampling works with time-series data only
# so convert "date" column to index
# index_col ="date", makes "date" column, the index of the data frame
df = pd.read_csv("apple.csv", parse_dates =["date"], index_col ="date")
# Printing the first 10 rows of dataframe
df[:10]
Python3
# Resampling the time series data based on months
# we apply it on stock close price
# 'M' indicates month
monthly_resampled_data = df.close.resample('M').mean()
# the above command will find the mean closing price
# of each month for a duration of 12 months.
monthly_resampled_data
Output :
Example #2: Resampling the data on weekly frequency
Python3
# importing pandas as pd
import pandas as pd
# We know that resampling works with time-series data
# only so convert "date" column to index
# index_col ="date", makes "date" column.
df = pd.read_csv("apple.csv", parse_dates =["date"], index_col ="date")
# Resampling the time series data based on weekly frequency
# we apply it on stock open price 'W' indicates week
weekly_resampled_data = df.open.resample('W').mean()
# find the mean opening price of each week
# for each week over a period of 1 year.
weekly_resampled_data
Output :
Example #3: Resampling the data on Quarterly frequency
Python3
# importing pandas as pd
import pandas as pd
# We know that resampling works with time-series
# data only so convert our "date" column to index
# index_col ="date", makes "date" column
df = pd.read_csv("apple.csv", parse_dates =["date"], index_col ="date")
# Resampling the time series data
# based on Quarterly frequency
# 'Q' indicates quarter
Quarterly_resampled_data = df.open.resample('Q').mean()
# mean opening price of each quarter
# over a period of 1 year.
Quarterly_resampled_data
Output :
Similar Reads
Pandas DataFrame iterrows() Method iterrows() method in Pandas is a simple way to iterate over rows of a DataFrame. It returns an iterator that yields each row as a tuple containing the index and the row data (as a Pandas Series). This method is often used in scenarios where row-wise operations or transformations are required. Exampl
4 min read
Python | Pandas Series.iteritems() Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.iteritems() function iterates
2 min read
Python | Pandas.to_datetime() When a CSV file is imported and a Data Frame is made, the Date time objects in the file are read as a string object rather than a Date Time object Hence itâs very tough to perform operations like Time difference on a string rather than a Date Time object. Pandas to_datetime() method helps to convert
4 min read
Python | pandas.to_numeric method Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. pandas.to_numeric() is one of the general functions in Pandas which is used to convert
2 min read
Pandas DataFrame.to_string-Python Pandas is a powerful Python library for data manipulation, with DataFrame as its key two-dimensional, labeled data structure. It allows easy formatting and readable display of data. DataFrame.to_string() function in Pandas is specifically designed to render a DataFrame into a console-friendly tabula
5 min read
pandas.concat() function in Python The pandas.concat() function does all the heavy lifting of performing concatenation operations along with an axis of Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes. Pandas concat() function SyntaxSyntax: concat(objs, axis, join, i
4 min read
Python | Pandas dataframe.cov() Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.cov() is used to compute pairwise covariance of columns. If some of t
2 min read
Pandas DataFrame duplicated() Method - Python Pandas is widely used library in Python used for tasks like cleaning, analyzing and transforming data. One important part of cleaning data is identifying and handling duplicate rows which can lead to incorrect results if left unchecked.The duplicated() method in Pandas helps us to find these duplica
2 min read
Pandas dataframe.drop_duplicates() When working with data in Pandas one common task is removing duplicate rows to ensure clean and accurate datasets. The drop_duplicates() method in Pandas is designed to make this process quick and easy. It allows us to remove duplicate rows from a DataFrame based on all columns or specific ones. By
4 min read
Pandas DataFrame.dropna() Method Pandas is one of the packages that makes importing and analyzing data much easier. Sometimes CSV file has null values, which are later displayed as NaN in Pandas DataFrame. Pandas dropna() method allows the user to analyze and drop Rows/Columns with Null values in different ways.  Pandas DataFrame.
3 min read