Time Series
Time Series
This project focuses on a practical and insightful analysis of Superstore sales data
spanning four years. The dataset includes daily records of customer orders, segmented
by product categories such as Furniture, Office Supplies, and Technology, along with
corresponding sales values and order dates. This rich dataset offers an ideal foundation
to explore time-based behavior and performance across product lines.
Using Python's powerful Pandas library, along with NumPy, Matplotlib, and Seaborn,
this notebook demonstrates how to:
By the end of this lab, we aim to extract actionable insights such as:
This analysis provides a foundational example of how time series tools in Pandas can be
applied to real-world business data. It’s not only a technical exercise but also a
demonstration of how temporal analysis contributes to strategic decision-making in
retail and sales operations.
Learning Outcomes
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 1/27
3/24/25, 11:11 AM Notebook
Pandas: has built-in Time Series functionality to work with dates, date ranges, and Time
Series data. It is useful for analyzing groups of time series and manipulating data.
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 2/27
3/24/25, 11:11 AM Notebook
Setup
In [ ]: # imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from datetime import datetime
from datetime import timedelta
from dateutil.relativedelta import relativedelta
from IPython.display import display
import os
os.chdir('data')
from colorsetup import colors, palette
sns.set_palette(palette)
# ignore warnings
warnings.filterwarnings('ignore')
pd.options.display.float_format = '{:,.1f}'.format
%matplotlib inline
plotsize = (13, 5)
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 3/27
3/24/25, 11:11 AM Notebook
In [ ]: df = pd.read_excel("Sample - Superstore.xls")
df.columns
Out[ ]: Index(['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',
'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State',
'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category',
'Product Name', 'Sales', 'Quantity', 'Discount', 'Profit'],
dtype='object')
We can see the data have been input and the columns are referenced by a Pandas Index
object. There are two Date variables (Order Date and Ship Date), variables for customer
and region, product type variables (Category, Sub-Category, Product Name), etc.
Note we reset the index, if we don't, Pandas sets the group variables to the index (more
on this later). We can see the result is a Pandas DataFrame with columns for Order
Date , Category , and Sales . We can think of this as a Sales time series for each
Category .
In [ ]: print("Columns:", base.columns)
print("Index:", base.index)
In [ ]: base.head()
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 4/27
3/24/25, 11:11 AM Notebook
Individual DataFrame columns are Pandas Series , and we can see the RangeIndex
on the left. This Pandas DataFrame is a combination of the RangeIndex and Pandas
Series objects, where each has an underlying data type:
In [ ]: base.dtypes
In [ ]: for x in base.columns:
print(x, type(base[x]), base[x].dtype)
If starting from the NumPy arrays, we could build the DataFrame (note dictionary input
structure):
In [ ]: df_from_numpy.dtypes
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 5/27
3/24/25, 11:11 AM Notebook
While the Array and Pandas Series are basically the same, we see the Series has an index,
and formats the date output somewhat.
In [ ]: order_date
In [ ]: order_date_daily
The order_date variable now has daily format, although this doesn't change much
because we already had one observation per day. In practice, leaving nanosecond
precision is usually fine.
In [ ]: order_date_monthly
In [ ]: np.unique(order_date_monthly)
In [ ]: len(np.unique(order_date_monthly))
Out[ ]: 48
In [ ]: print(base.head())
print('\n Unique categories:')
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 6/27
3/24/25, 11:11 AM Notebook
print(base['Category'].unique())
Unique categories:
['Office Supplies' 'Furniture' 'Technology']
In [ ]: base.head()
Order Date
In [ ]: print(base.index)
#print(base.index.unique())
Subsetting data
We now have a DatetimeIndex and we can use it to select data subsets:
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 7/27
3/24/25, 11:11 AM Notebook
In [ ]: # Observations in 2014
print(base['2011'].head())
print('\n')
# Observations in a range of dates, subset of columns:
print(base[base['Category'] == 'Office Supplies']['2011':'2012-02'].head())
Category Sales
Order Date
2011-01-04 Office Supplies 16.4
2011-01-05 Office Supplies 288.1
2011-01-06 Office Supplies 19.5
2011-01-07 Furniture 2,573.8
2011-01-07 Office Supplies 685.3
Category Sales
Order Date
2011-01-04 Office Supplies 16.4
2011-01-05 Office Supplies 288.1
2011-01-06 Office Supplies 19.5
2011-01-07 Office Supplies 685.3
2011-01-08 Office Supplies 10.4
Datetime Components
Pandas Datetime variables have a number of useful components. Using the
DatetimeIndex, we can extract items like month, year, day of week, quarter, etc.:
Week: Int64Index([ 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
...
52, 1, 1, 1, 1, 1, 1, 1, 1, 1],
dtype='int64', name='Order Date', length=2864)
While data from existing variables may be sufficient, some Time Series applications
require that data contain all periods and have a Frequency assigned. We can see above
that our data do not have a frequency (freq=None). While the data seem daily, there are
many types of possible frequencies (business days, weekdays, etc.). If the input data are
already standardized, Pandas will infer a Frequency and assign it. Otherwise, we need to
ensure there are:
Setting a Frequency helps ensure the data are standardized and will work in applications,
and is also required for functionality like resampling.
Pivoting Data:
Because there are multiple categories, we have multiple Time Series to analyze. As a
result, our DatetimeIndex does not uniquely identify an observation. To uniquely
identify observations, we can either add categorical variables to the Index, or set a
Pandas DateTimeIndex with separate columns for each series. There are several ways to
accomplish this. The first appraoch uses Pandas' built-in pivot method:
Order Date
Note that missing values ( NaN ) are often introduced here, and can be set to 0 easily
using the fillna(0) method.
Unstacking:
To achieve the same result in Pandas, it is often easier to use the Index and unstack /
(stack) methods. The unstack method transforms long data into wide data by creating
columns by category for levels of the index, while stack does the reverse.
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 9/27
3/24/25, 11:11 AM Notebook
Here, we can tell Pandas that the Date and Category values are part of the Index and
use the unstack function to generate separate columns (this also removes the
Category column from the Index):
Order Date
In [ ]: print(sales.index)
print('\nUnique dates in our data: ', len(sales.index.unique()), 'Days')
Since we have now created a column for each category, we can see there no longer
repeated values in the Datetime Index.
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 10/27
3/24/25, 11:11 AM Notebook
To use this index, we need to tell Pandas how to treat missing values. In this case, we
want to use zero for days without sales data.
In [ ]: sales_new.index
We can see the result now has a daily frequency. While some Time Seriods models will
work without an explicit frequency, many will not. It is also helps to ensure we aren't
missing important dates when summarizing and plotting the data.
Resampling
We can now easily Resample our data at any desired frequency, using either the
asfreq method or the resample method. The asfreq method assumes a default
fill approach (which can be dangerous). The resample method allows this to be
specified directly. which generates a resampler object. To get to values, we need to
specify an aggregation function if upsampling (moving to a lower frequency), or fill
function if downsampling (moving to a higher frequency). This typically the sum or mean
for upsampling, or interpolate for downsampling. We generate results for some common
frequencies below:
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 11/27
3/24/25, 11:11 AM Notebook
In [ ]: sales_weekly = sales_new.resample('W').sum()
print('Weekly Sales')
print(sales_weekly.head(), '\n')
sales_monthly = sales_new.resample('M').sum()
print('Monthly Sales')
print(sales_monthly.head(), '\n')
sales_quarterly = sales_new.resample('Q').sum()
print('Quarterly Sales')
print(sales_quarterly.head(), '\n')
sales_annual = sales_new.resample('Y').sum()
print('Annual Sales')
print(sales_annual.head())
Weekly Sales
Furniture Office Supplies Technology
2011-01-09 2,650.5 1,019.8 1,147.9
2011-01-16 1,003.8 2,039.4 827.9
2011-01-23 1,747.3 871.1 824.1
2011-01-30 550.2 680.3 343.3
2011-02-06 290.7 502.7 649.9
Monthly Sales
Furniture Office Supplies Technology
2011-01-31 5,951.9 4,851.1 3,143.3
2011-02-28 2,130.3 1,071.7 1,608.5
2011-03-31 14,574.0 8,605.9 32,511.2
2011-04-30 7,944.8 11,155.1 9,195.4
2011-05-31 6,912.8 7,135.6 9,599.9
Quarterly Sales
Furniture Office Supplies Technology
2011-03-31 22,656.1 14,528.7 37,263.0
2011-06-30 28,063.7 31,243.7 27,231.3
2011-09-30 41,957.9 53,924.0 47,751.4
2011-12-31 64,515.1 52,080.0 63,032.6
2012-03-31 27,374.1 23,059.4 18,418.2
Annual Sales
Furniture Office Supplies Technology
2011-12-31 157,192.9 151,776.4 175,278.2
2012-12-31 170,518.2 137,233.5 162,780.8
2013-12-31 198,901.4 183,510.6 226,061.8
2014-12-31 215,387.3 246,526.6 272,033.2
In [ ]: # Note that downsampling (from Annual to Monthly for example) produces missing v
sales_monthly_from_annual = sales_annual.resample('M')
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 12/27
3/24/25, 11:11 AM Notebook
In [ ]: sales_daily = sales.asfreq('D')
sales_businessday = sales.asfreq('B')
sales_hourly = sales.asfreq('h')
# This will generate missing values:
sales_hourly.head()
Order Date
Variable Transformations
For Time Series models, we may want to use transformed variables (log, difference,
growth rate, etc). The example below illustrates how we might generate these variables in
Pandas, using the Monthly Sales dataset.
Stationarity Transformations
Concerns about Stationarity often lead to considering variable transformations. Some
commonly-used transformation methods (Variable Differencing, Percentage Change, and
Log) are implemented below. Because of Index has several levels here, these
transformations can be done for each outcome variable with one line (the results could
be joined together using the Pandas concat method).
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 13/27
3/24/25, 11:11 AM Notebook
# Log Sales
print('\nlog(1+Monthly Sales) \n', np.log(1 + sales_monthly).head())
log(1+Monthly Sales)
Furniture Office Supplies Technology
2011-01-31 8.7 8.5 8.1
2011-02-28 7.7 7.0 7.4
2011-03-31 9.6 9.1 10.4
2011-04-30 9.0 9.3 9.1
2011-05-31 8.8 8.9 9.2
Out[ ]: Office Office
Furniture Technology Furniture_%_Change Techn
Supplies Supplies_%_Change
2011-
5,951.9 4,851.1 3,143.3 nan nan
01-31
2011-
2,130.3 1,071.7 1,608.5 -0.6 -0.8
02-28
2011-
14,574.0 8,605.9 32,511.2 5.8 7.0
03-31
2011-
7,944.8 11,155.1 9,195.4 -0.5 0.3
04-30
2011-
6,912.8 7,135.6 9,599.9 -0.1 -0.4
05-31
In [ ]: window_size = 7
rolling_window = sales_new.rolling(window_size)
print('Rolling Mean')
print(rolling_window.mean().dropna().head())
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 14/27
3/24/25, 11:11 AM Notebook
Rolling Mean
Furniture Office Supplies Technology
2011-01-10 378.6 147.0 168.4
2011-01-11 386.1 145.1 168.4
2011-01-12 387.5 103.9 168.4
2011-01-13 387.5 101.1 168.4
2011-01-14 145.5 292.8 96.8
Cumulative Sales
Furniture Office Supplies Technology
2011-01-04 0.0 16.4 0.0
2011-01-05 0.0 304.5 0.0
2011-01-06 0.0 324.0 0.0
2011-01-07 2,573.8 1,009.4 1,147.9
2011-01-08 2,650.5 1,019.8 1,147.9
Visualization
Here we explore methods for plotting Time Series Data. Most of these packages use
Matplotlib's pyplot library, although it may not be called directly. This means it is
possible to adjust plot features, like the title, using pyplot commands.
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 15/27
3/24/25, 11:11 AM Notebook
Here, we plot functions like rolling averages and cumulative Sales calculated above:
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 16/27
3/24/25, 11:11 AM Notebook
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 17/27
3/24/25, 11:11 AM Notebook
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 18/27
3/24/25, 11:11 AM Notebook
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 19/27
3/24/25, 11:11 AM Notebook
Using the source data, set up Monthly data for Sales and Profit by Segment by either (1)
Resampling or (2) Grouping data by Year and Month.
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 20/27
3/24/25, 11:11 AM Notebook
Home Home
Segment Consumer Corporate Consumer Corporate
Office Office
Order Date
2011-01-
106.5 5.9 185.0 1,304.1 568.0 855.9
31
2011-02-
228.3 126.0 37.9 1,442.7 464.1 104.1
28
2011-03-
-26.5 131.3 73.7 3,777.8 1,988.4 4,439.9
31
2011-04-
336.9 435.6 527.9 3,752.8 3,951.2 2,031.6
30
2011-05-
484.0 873.0 -63.3 5,373.2 4,077.7 696.1
31
Analyze the results from the first exercise to determine whether Autocorrelation or
Seasonal patterns differ by Segment or whether we are looking at Sales or Profits.
In [ ]: fig,axes = plt.subplots(9,2,figsize=(20,15),)
for i,cat in enumerate(['Consumer','Corporate','Home Office']):
for j,money in enumerate(['Sales','Profit']):
axes[i,j].plot(prof_month[money,cat])
axes[i,j].title.set_text(cat+" "+money)
plot_acf(prof_month[money,cat],ax=axes[i+3,j],title = cat+" "+money+" AC
month_plot(prof_month[money,cat],ax=axes[i+6,j])
fig.tight_layout()
plt.show()
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 21/27
3/24/25, 11:11 AM Notebook
Seasonal patterns across groups are pretty similar and there is very little autocorrelation.
Use the result from Exercise 2 to develop an EDA function to explore other variables (like
Region or Sub-Category) that may be of interest.
In [ ]: cat_var = 'Region'
date_var = 'Order Date'
money_vars = ['Profit', 'Sales']
def monthly_eda(cat_var=cat_var,
date_var=date_var,
money_vars=money_vars):
new_vars = [cat_var, date_var] + money_vars
cats = list(df[cat_var].unique())
num_cats = len(cats)
new_base = df[new_vars].set_index(date_var)
prof_pivot = new_base.pivot_table(columns=cat_var,index = date_var)
prof_month = prof_pivot.resample('M').sum()
prof_month.head()
fig.tight_layout()
plt.show()
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 22/27
3/24/25, 11:11 AM Notebook
In [ ]: monthly_eda(cat_var='Region')
In [ ]: monthly_eda(cat_var='Sub-Category')
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 23/27
3/24/25, 11:11 AM Notebook
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 24/27
3/24/25, 11:11 AM Notebook
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 25/27
3/24/25, 11:11 AM Notebook
Key Takeaways:
Transforming raw daily sales data into monthly aggregates using .resample('M')
simplified the data structure and highlighted long-term trends and seasonality.
This enabled a clearer view of cyclical patterns, helping to identify months with
consistently high or low sales performance.
The three main product categories— Furniture, Office Supplies, and Technology—
showed distinct temporal behaviors.
Technology exhibited more volatility but had stronger peaks, suggesting it may be
driven by promotions or year-end demand.
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 26/27
3/24/25, 11:11 AM Notebook
Furniture revealed sporadic spikes, which may correlate with bulk purchases or
corporate outfitting cycles.
These insights are valuable for targeted marketing and stock replenishment
strategies per category.
Applying a 6-month moving average helped to reduce short-term noise and focus
on the underlying trend.
Understanding these patterns allows business units to better plan inventory, staffing,
and promotional campaigns.
Unusual spikes or dips identified in the plots may represent opportunities for further
analysis—e.g., did a promotion succeed? Did supply chain disruptions impact
delivery?
file:///C:/Users/ayobola.lawal_kuda/Downloads/Final_Pandas_TimeSeries_by_Ayobola_Lawal.html 27/27