0% found this document useful (0 votes)
80 views202 pages

Time Series Analysis Book

This document provides a summary of time series analysis. It begins with an introduction to time series analysis, including its objectives and assumptions. It then discusses the key components of time series analysis like trend, seasonality, and cyclical patterns. The document also covers topics like stationary and non-stationary time series, methods to check stationarity, and how to convert non-stationary data to stationary. It describes techniques like moving average methodology, auto-regressive models, and ARIMA models. The document concludes with frequently asked questions about time series analysis.

Uploaded by

mohamedmohy37173
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views202 pages

Time Series Analysis Book

This document provides a summary of time series analysis. It begins with an introduction to time series analysis, including its objectives and assumptions. It then discusses the key components of time series analysis like trend, seasonality, and cyclical patterns. The document also covers topics like stationary and non-stationary time series, methods to check stationarity, and how to convert non-stationary data to stationary. It describes techniques like moving average methodology, auto-regressive models, and ARIMA models. The document concludes with frequently asked questions about time series analysis.

Uploaded by

mohamedmohy37173
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 202

Helwan university

Faculty of science
Mathematics department

Time series

Prepared by

Dr . Ahmed FAwzy
‫الناشر ‪ :‬جهاز النصر والتوزًع لمكتاب الحامعي ‪ -‬جامعة حموان‬

‫حقوق التألٌف مخفوظة لممؤلف‬

‫‪2023‬‬
Introduction
Time Series Analysis is a way of studying the characteristics of the response variable
concerning time as the independent variable. To estimate the target variable
in predicting or forecasting, use the time variable as the reference point. TSA
represents a series of time-based orders, it would be Years, Months, Weeks, Days,
Horus, Minutes, and Seconds. It is an observation from the sequence of discrete time
of successive intervals. Some real-world application of TSA includes weather
forecasting models, stock market predictions, signal processing, and control
systems. Since TSA involves producing the set of information in a particular
sequence, this makes it distinct from spatial and other analyses. We could predict
the future using AR, MA, ARMA, and ARIMA models. In this article, we will be
decoding time series analysis for you.

Learning Objectives

• We will discuss in detail TSA Objectives, Assumptions, and Components


(stationary and non-stationary).
• We will look at the TSA algorithms.
• Finally, we will look at specific use cases in Python.

This article was published as a part of the Data Science Blogathon.


Table of contents

• What Is Time Series Analysis?


• How to Analyze Time Series?
• Significance of Time Series
• Components of Time Series Analysis
• What Are the Limitations of Time Series Analysis?
• Data Types of Time Series
• Methods to Check Stationarity
• Converting Non-Stationary Into Stationary
• Moving Average Methodology
• Time Series Analysis in Data Science and Machine Learning
• What Is an Auto-Regressive Model?
• Implementation of Auto-Regressive Model
• Implementation of Moving Average (Weights – Simple Moving Average)
• Understanding ARMA and ARIMA
• Understand the signature of ARIMA
• Process Flow (Re-Gap)
• Conclusion
• Frequently Asked Questions
What Is Time Series Analysis?
Time series analysis is a specific way of analyzing a sequence of data points
collected over time. In TSA, analysts record data points at consistent intervals over
a set period rather than just recording the data points intermittently or randomly.

Objectives of Time Series Analysis

• To understand how time series works and what factors affect a certain
variable(s) at different points in time.
• Time series analysis will provide the consequences and insights of the given
dataset’s features that change over time.
• Supporting to derive the predicting the future values of the time series
variable.
• Assumptions: There is only one assumption in TSA, which is “stationary,”
which means that the origin of time does not affect the properties of the
process under the statistical factor.

How to Analyze Time Series?


To perform the time series analysis, we have to follow the following steps:

• Collecting the data and cleaning it


• Preparing Visualization with respect to time vs key feature
• Observing the stationarity of the series
• Developing charts to understand its nature.
• Model building – AR, MA, ARMA and ARIMA
• Extracting insights from prediction
Significance of Time Series
TSA is the backbone for prediction and forecasting analysis, specific to time-based
problem statements.

• Analyzing the historical dataset and its patterns


• Understanding and matching the current situation with patterns derived from
the previous stage.
• Understanding the factor or factors influencing certain variable(s) in different
periods.

With the help of “Time Series,” we can prepare numerous time-based analyses and
results.

• Forecasting: Predicting any value for the future.


• Segmentation: Grouping similar items together.
• Classification: Classifying a set of items into given classes.
• Descriptive analysis: Analysis of a given dataset to find out what is there in
it.
• Intervention analysis: Effect of changing a given variable on the outcome.

Components of Time Series Analysis


Let’s look at the various components of Time Series Analysis:

• Trend: In which there is no fixed interval and any divergence within the given
dataset is a continuous timeline. The trend would be Negative or Positive or
Null Trend
• Seasonality: In which regular or fixed interval shifts within the dataset in a
continuous timeline. Would be bell curve or saw tooth
• Cyclical: In which there is no fixed interval, uncertainty in movement and its
pattern
• Irregularity: Unexpected situations/events/scenarios and spikes in a short
time span.

What Are the Limitations of Time Series Analysis?


Time series has the below-mentioned limitations; we have to take care of those
during our data analysis.

• Similar to other models, the missing values are not supported by TSA
• The data points must be linear in their relationship.
• Data transformations are mandatory, so they are a little expensive.
• Models mostly work on Uni-variate data.

Data Types of Time Series


Let’s discuss the time series’ data types and their influence. While discussing TS
data types, there are two major types – stationary and non-stationary.
Stationary: A dataset should follow the below thumb rules without having Trend,
Seasonality, Cyclical, and Irregularity components of the time series.

• The mean value of them should be completely constant in the data during the
analysis.
• The variance should be constant with respect to the time-frame
• Covariance measures the relationship between two variables.

Non- Stationary: If either the mean-variance or covariance is changing with respect


to time, the dataset is called non-stationary.

Methods to Check Stationarity


During the TSA model preparation workflow, we must assess whether the dataset is
stationary or not. This is done using Statistical Tests. There are two tests available
to test if the dataset is stationary:

• Augmented Dickey-Fuller (ADF) Test


• Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test
Augmented Dickey-Fuller (ADF) Test or Unit Root Test
The ADF test is the most popular statistical test. It is done with the following
assumptions:

• Null Hypothesis (H0): Series is non-stationary


• Alternate Hypothesis (HA): Series is stationary
o p-value >0.05 Fail to reject (H0)
o p-value <= 0.05 Accept (H1)

Kwiatkowski–Phillips–Schmidt–Shin (KPSS) Test


These tests are used for testing a NULL Hypothesis (HO) that will perceive the time
series as stationary around a deterministic trend against the alternative of a unit root.
Since TSA is looking for Stationary Data for its further analysis, we have to ensure
that the dataset is stationary.

Converting Non-Stationary Into Stationary


Let’s discuss quickly how to convert non-stationary to stationary for effective time
series modeling. There are three methods available for this conversion – detrending,
differencing, and transformation.

Detrending
It involves removing the trend effects from the given dataset and showing only the
differences in values from the trend. It always allows cyclical patterns to be
identified.
Differencing
This is a simple transformation of the series into a new time series, which we use to
remove the series dependence on time and stabilize the mean of the time series, so
trend and seasonality are reduced during this transformation.

• Yt= Yt – Yt-1
• Yt=Value with time
Transformation
This includes three different methods they are Power Transform, Square Root, and
Log Transfer. The most commonly used one is Log Transfer.

Moving Average Methodology


The commonly used time series method is the Moving Average. This method is slick
with random short-term variations. Relatively associated with the components of
time series.

The Moving Average (MA) (or) Rolling Mean: The value of MA is calculated by
taking average data of the time-series within k periods.

Let’s see the types of moving averages:

• Simple Moving Average (SMA),


• Cumulative Moving Average (CMA)
• Exponential Moving Average (EMA)

Simple Moving Average (SMA)


The Simple Moving Average (SMA) calculates the unweighted mean of the previous
M or N points. We prefer selecting sliding window data points based on the amount
of smoothing, as increasing the value of M or N improves smoothing but reduces
accuracy.

To understand better, I will use the air temperature dataset.


Code

import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
df_temperature = pd.read_csv('temperature_TSA.csv', encoding='utf-8')
df_temperature.head()
Output

Code

df_temperature.info()
Output

Code

# set index for year column


df_temperature.set_index('Any', inplace=True)
df_temperature.index.name = 'year'
# Yearly average air temperature - calculation
df_temperature['average_temperature'] = df_temperature.mean(axis=1)
# drop unwanted columns and resetting the datafreame
df_temperature = df_temperature[['average_temperature']]
df_temperature.head()
Output
Code

# SMA over a period of 10 and 20 years


df_temperature['SMA_10'] = df_temperature.average_temperature.rolling(10,
min_periods=1).mean()
df_temperature['SMA_20'] = df_temperature.average_temperature.rolling(20,
min_periods=1).mean()

# Grean = Avg Air Temp, RED = 10 yrs, ORANG colors for the line plot
colors = ['green', 'red', 'orange']
# Line plot
df_temperature.plot(color=colors, linewidth=3, figsize=(12,6))
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.legend(labels =['Average air temperature', '10-years SMA', '20-years SMA'],
fontsize=14)
plt.title('The yearly average air temperature in city', fontsize=20)
plt.xlabel('Year', fontsize=16)
plt.ylabel('Temperature [°C]', fontsize=16)
Output
Cumulative Moving Average (CMA)
The CMA is the unweighted mean of past values till the current time.
Code

# CMA Air temperature


df_temperature['CMA'] = df_temperature.average_temperature.expanding().mean()

# green -Avg Air Temp and Orange -CMA


colors = ['green', 'orange']
# line plot
df_temperature[['average_temperature', 'CMA']].plot(color=colors, linewidth=3,
figsize=(12,6))
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.legend(labels =['Average Air Temperature', 'CMA'], fontsize=14)
plt.title('The yearly average air temperature in city', fontsize=20)
plt.xlabel('Year', fontsize=16)
plt.ylabel('Temperature [°C]', fontsize=16)
Output

Exponential Moving Average (EMA)


EMA is mainly used to identify trends and filter out noise. The weight of elements
is decreased gradually over time. This means It gives weight to recent data points,
not historical ones. Compared with SMA, the EMA is faster to change and more
sensitive.

α –>Smoothing Factor.

• It has a value between 0,1.


• Represents the weighting applied to the very recent period.

Let’s apply the exponential moving averages with a smoothing factor of 0.1 and 0.3
in the given dataset.

Code

# EMA Air Temperature


# Let's smoothing factor - 0.1
df_temperature['EMA_0.1'] =
df_temperature.average_temperature.ewm(alpha=0.1, adjust=False).mean()
# Let's smoothing factor - 0.3
df_temperature['EMA_0.3'] =
df_temperature.average_temperature.ewm(alpha=0.3, adjust=False).mean()

# green - Avg Air Temp, red- smoothing factor - 0.1, yellow - smoothing factor -
0.3
colors = ['green', 'red', 'yellow']
df_temperature[['average_temperature', 'EMA_0.1', 'EMA_0.3']].plot(color=colors,
linewidth=3, figsize=(12,6), alpha=0.8)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.legend(labels=['Average air temperature', 'EMA - alpha=0.1', 'EMA -
alpha=0.3'], fontsize=14)
plt.title('The yearly average air temperature in city', fontsize=20)
plt.xlabel('Year', fontsize=16)
plt.ylabel('Temperature [°C]', fontsize=16)
Output
Time Series Analysis in Data Science and Machine Learning
When dealing with TSA in Data Science and Machine Learning, there are multiple
model options are available. In which the Autoregressive–Moving-Average
(ARMA) models with [p, d, and q].

• P==> autoregressive lags


• q== moving average lags
• d==> difference in the order

Before we get to know about Arima, first, you should understand the below terms
better.

• Auto-Correlation Function (ACF)


• Partial Auto-Correlation Function (PACF)

Auto-Correlation Function (ACF)


ACF indicates how similar a value is within a given time series and the previous
value. (OR) It measures the degree of the similarity between a given time series and
the lagged version of that time series at the various intervals we observed.

Python Statsmodels library calculates autocorrelation. It identifies a set of trends in


the given dataset and the influence of former observed values on the currently
observed values.

Partial Auto-Correlation (PACF)


PACF is similar to Auto-Correlation Function and is a little challenging to
understand. It always shows the correlation of the sequence with itself with some
number of time units per sequence order in which only the direct effect has been
shown, and all other intermediary effects are removed from the given time series.

Auto-Correlation and Partial Auto-Correlation


Code

plot_acf(df_temperature)
plt.show()

plot_acf(df_temperature, lags=30)
plt.show()
Output

Observation

The previous temperature influences the current temperature, but the significance of
that influence decreases and slightly increases from the above visualization along
with the temperature with regular time intervals.
Types of Auto-Correlation

Interpret ACF and PACF Plots

ACF PACF Perfect ML -Model

Plot declines gradually Plot drops instantly Auto Regressive model.

Plot drops instantly Plot declines gradually Moving Average model

Plot decline gradually Plot Decline gradually ARMA

Plot drop instantly Plot drop instantly You wouldn’t perform any model

Remember that both ACF and PACF require stationary time series for analysis.

What Is an Auto-Regressive Model?


An auto-regressive model is a simple model that predicts future performance based
on past performance. It is mainly used for forecasting when there is some correlation
between values in a given time series and those that precede and succeed (back and
forth).

An AR is a Linear Regression model that uses lagged variables as input. By


indicating the input, the Linear Regression model can be easily built using the scikit-
learn library. Statsmodels library provides autoregression model-specific functions
where you must specify an appropriate lag value and train the model. It is provided
in the AutoTeg class to get the results using simple steps.
• Creating the model AutoReg()
• Call fit() to train it on our dataset.
• Returns an AutoRegResults object.
• Once fit, make a prediction by calling the predict () function

The equation for the AR model (Let’s compare Y=mX+c)

Yt =C+b1 Yt-1+ b2 Yt-2+……+ bp Yt-p+ Ert

Key Parameters

• p=past values
• Yt=Function of different past values
• Ert=errors in time
• C=intercept

Lets’s check whether the given data set or time series is random or not.

Code

from matplotlib import pyplot


from pandas.plotting import lag_plot
lag_plot(df_temperature)
pyplot.show()
Output
Observation

Yes, it looks random and scattered.

Implementation of Auto-Regressive Model


Code

#import libraries
from matplotlib import pyplot
from statsmodels.tsa.ar_model import AutoReg
from sklearn.metrics import mean_squared_error
from math import sqrt
# load csv as dataset
#series = read_csv('daily-min-temperatures.csv', header=0, index_col=0,
parse_dates=True, squeeze=True)
# split dataset for test and training
X = df_temperature.values
train, test = X[1:len(X)-7], X[len(X)-7:]
# train autoregression
model = AutoReg(train, lags=20)
model_fit = model.fit()
print('Coefficients: %s' % model_fit.params)
# Predictions
predictions = model_fit.predict(start=len(train), end=len(train)+len(test)-1,
dynamic=False)
for i in range(len(predictions)):
print('predicted=%f, expected=%f' % (predictions[i], test[i]))
rmse = sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse)
# plot results
pyplot.plot(test)
pyplot.plot(predictions, color='red')
pyplot.show()
Output

predicted=15.893972, expected=16.275000
predicted=15.917959, expected=16.600000
predicted=15.812741, expected=16.475000
predicted=15.787555, expected=16.375000
predicted=16.023780, expected=16.283333
predicted=15.940271, expected=16.525000
predicted=15.831538, expected=16.758333
Test RMSE: 0.617

Observation

Expected (blue) Against Predicted (red). The forecast looks good on the 4th and the
deviation on the 6th day.
Implementation of Moving Average (Weights – Simple Moving Average)
Code

import numpy as np
alpha= 0.3
n = 10
w_sma = np.repeat(1/n, n)
colors = ['green', 'yellow']
# weights - exponential moving average alpha=0.3 adjust=False
w_ema = [(1-ALPHA)**i if i==N-1 else alpha*(1-alpha)**i for i in range(n)]
pd.DataFrame({'w_sma': w_sma, 'w_ema': w_ema}).plot(color=colors, kind='bar',
figsize=(8,5))
plt.xticks([])
plt.yticks(fontsize=10)
plt.legend(labels=['Simple moving average', 'Exponential moving average
(α=0.3)'], fontsize=10)
# title and labels
plt.title('Moving Average Weights', fontsize=10)
plt.ylabel('Weights', fontsize=10)
Output
Understanding ARMA and ARIMA
ARMA is a combination of the Auto-Regressive and Moving Average models for
forecasting. This model provides a weakly stationary stochastic process in terms of
two polynomials, one for the Auto-Regressive and the second for the Moving
Average.

ARMA is best for predicting stationary series. ARIMA was thus developed to
support both stationary as well as non-stationary series.

• AR ==> Uses past values to predict the future.


• MA ==> Uses past error terms in the given series to predict the future.
• I==> Uses the differencing of observation and makes the stationary data.

AR+I+MA= ARIMA
Understand the signature of ARIMA

• p==> log order => No of lag observations.


• d==> degree of differencing => No of times that the raw observations are
differenced.
• q==>order of moving average => the size of the moving average window

Implementation Steps for ARIMA

• Plot a time series format


• Difference to make stationary on mean by removing the trend
• Make stationary by applying log transform.
• Difference log transform to make as stationary on both statistic mean and
variance
• Plot ACF & PACF, and identify the potential AR and MA model
• Discovery of best fit ARIMA model
• Forecast/Predict the value using the best fit ARIMA model
• Plot ACF & PACF for residuals of the ARIMA model, and ensure no more
information is left.

Implementation of ARIMA in Python


We have already discussed steps 1-5 which will remain the same; let’s focus on the
rest here.

Code

from statsmodels.tsa.arima_model import ARIMA


model = ARIMA(df_temperature, order=(0, 1, 1))
results_ARIMA = model.fit()
results_ARIMA.summary()
Output

Code

results_ARIMA.forecast(3)[0]
Output

array([16.47648941, 16.48621826, 16.49594711])


Code

results_ARIMA.plot_predict(start=200)
plt.show()
Output
Process Flow (Re-Gap)
In recent years, the use of Deep Learning for Time Series Analysis and Forecasting
has increased to resolve problem statements that couldn’t be handled using Machine
Learning techniques. Let’s discuss this briefly.

Recurrent Neural Networks (RNN) is the most traditional and accepted architecture
fitment for Time-Series forecasting-based problems.

RNN is organized into successive layers and divided into

• Input
• Hidden
• Output

Each layer has equal weight, and every neuron has to be assigned to fixed time steps.
Do remember that every one of them is fully connected with a hidden layer (Input
and Output) with the same time steps, and the hidden layers are forwarded and time-
dependent in direction.

Components of RNN

• Input: The function vector of x(t) is the input at time step t.


• Hidden:
o The function vector h(t) is the hidden state at time t,
o This is a kind of memory of the established network;
o This has been calculated based on the current input x(t) and the
previous-time step’s hidden-state h(t-1):
• Output: The function vector y(t) is the output at time step t.
• Weights : Weights: In the RNNs, the input vector connected to the hidden
layer neurons at time t is by a weight matrix of U (Please refer to the above
picture),

nternally weight matrix W is formed by the hidden layer neurons of time t-1 and t+1.
Following this, the hidden layer with to the output vector y(t) of time t by
a V (weight matrix); all the weight matrices U, W, and V are constant for each time
step.

Advantages of RNN

• It has the special feature that it remembers every piece of information, so RNN
is much useful for time series prediction
• Perfect for creating complex patterns from the input time series dataset
• Fast in prediction/forecasting
• Not affected by missing values, so the cleansing process can be limited.

Disadvantages of RNN

• The big challenge is during the training period


• Expensive computation cost.

Conclusion
A time series is constructed by data that is measured over time at evenly spaced
intervals. I hope this comprehensive guide has helped you all understand the time
series, its flow, and how it works. Although the TSA is widely used to handle data
science problems, it has certain limitations, such as not supporting missing values.
Note that the data points must be linear in their relationship for Time Series Analysis
to be done.

Ready to dive deeper into Time Series Analysis? Enhance your skills with Analytics
Vidhya’s comprehensive courses and unlock new possibilities in your data science
careers. Check out our courses today!
Key Takeaways

• Time series is a sequence of various data points that occurred in a successive


order for a given period of time.
• Trend, Seasonality, Cyclical, and Irregularity are components of TSA.

Frequently Asked Questions


Q1 .What are the four main components of a time series?

A. The four main components of time series are Trend, Seasonality, Cyclical, and
Irregularity.

Q2. How do you do time series analysis step by step?

A. Here are the steps to analyze time series:

1. Collect the data and clean it.


2. Prepare visualization with respect to time vs. key feature.
3. Observe the stationarity of the series.
4. Develop charts to understand its nature.
5. Build the model – AR, MA, ARMA, and ARIMA.
6. Extract insights from prediction.

Q3. What are the 3 fundamental steps to model a time series?

A. The three fundamental steps to model a time series are :

1. Building a model for time series.


2. Validating the model
3. Using the model to forecast future values / impute missing values.
https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-to-time-
series-analysis/

Time series data often arise when monitoring industrial processes or tracking
corporate business metrics. The essential difference between modeling data via
time series methods or using the process monitoring methods discussed earlier in
this chapter is the following:
Time series analysis accounts for the fact that data points taken over time may
have an internal structure (such as autocorrelation, trend or seasonal variation)
that should be accounted for.
This section will give a brief overview of some of the more widely used techniques
in the rich and rapidly growing field of time series modeling and analysis.

Definition of Time Series: An ordered


sequence of values of a variable at equally
spaced time intervals.

Applications: The usage of time series models is twofold:

• Obtain an understanding of the underlying forces and


structure that produced the observed data
• Fit a model and proceed to forecasting, monitoring or even
feedback and feedforward control.

Time Series Analysis is used for many applications such as:

• Economic Forecasting
• Sales Forecasting
• Budgetary Analysis
• Stock Market Analysis
• Yield Projections
• Process and Quality Control
• Inventory Studies
• Workload Projections
• Utility Studies
• Census Analysis

and many, many more...

There are many methods used to model and forecast time series
Techniques: The fitting of time series models can be an ambitious
undertaking. There are many methods of model fitting including
the following:

Univariate Time Series Models

Univariate Time Series

The term "univariate time series" refers to a time series that


consists of single (scalar) observations recorded sequentially
over equal time increments. Some examples are monthly
CO2 concentrations and southern oscillations to predict el nino
effects.

Although a univariate time series data set is usually given as a


single column of numbers, time is in fact an implicit variable
in the time series. If the data are equi-spaced, the time
variable, or index, does not need to be explicitly given. The
time variable may sometimes be explicitly used for plotting
the series. However, it is not used in the time series model
itself.

The analysis of time series where the data are not collected in
equal time increments is beyond the scope of this handbook.

Data Set of Monthly CO2 Concentrations

Source and This data set contains selected monthly mean


Backgroun CO2 concentrations at the Mauna Loa Observatory
d from 1974 to 1987. The CO2 concentrations were
measured by the continuous infrared analyser of the
Geophysical Monitoring for Climatic Change
division of NOAA's Air Resources Laboratory. The
selection has been for an approximation of
'background conditions'. See Thoning et al.,
"Atmospheric Carbon Dioxide at Mauna Loa
Observatory: II Analysis of the NOAA/GMCC Data
1974-1985", Journal of Geophysical
Research (submitted) for details.

This dataset was received from Jim Elkins of


NOAA in 1988.

Data Each line contains the CO2 concentration (mixing


ratio in dry air, expressed in the WMO X85 mole
fraction scale, maintained by the Scripps Institution
of Oceanography). In addition, it contains the year,
month, and a numeric value for the combined month
and year. This combined date is useful for plotting
purposes. The reader can download the data as
a text file.

CO2 Year&Month Year Month


--------------------------------------------------
333.13 1974.38 1974 5
332.09 1974.46 1974 6
331.10 1974.54 1974 7
329.14 1974.63 1974 8
327.36 1974.71 1974 9
327.29 1974.79 1974 10
328.23 1974.88 1974 11
329.55 1974.96 1974 12

330.62 1975.04 1975 1


331.40 1975.13 1975 2
331.87 1975.21 1975 3
333.18 1975.29 1975 4
333.92 1975.38 1975 5
333.43 1975.46 1975 6
331.85 1975.54 1975 7
330.01 1975.63 1975 8
328.51 1975.71 1975 9
328.41 1975.79 1975 10
329.25 1975.88 1975 11
330.97 1975.96 1975 12
331.60 1976.04 1976 1
332.60 1976.13 1976 2
333.57 1976.21 1976 3
334.72 1976.29 1976 4
334.68 1976.38 1976 5
334.17 1976.46 1976 6
332.96 1976.54 1976 7
330.80 1976.63 1976 8
328.98 1976.71 1976 9
328.57 1976.79 1976 10
330.20 1976.88 1976 11
331.58 1976.96 1976 12

332.67 1977.04 1977 1


333.17 1977.13 1977 2
334.86 1977.21 1977 3
336.07 1977.29 1977 4
336.82 1977.38 1977 5
336.12 1977.46 1977 6
334.81 1977.54 1977 7
332.56 1977.63 1977 8
331.30 1977.71 1977 9
331.22 1977.79 1977 10
332.37 1977.88 1977 11
333.49 1977.96 1977 12

334.71 1978.04 1978 1


335.23 1978.13 1978 2
336.54 1978.21 1978 3
337.79 1978.29 1978 4
337.95 1978.38 1978 5
338.00 1978.46 1978 6
336.37 1978.54 1978 7
334.47 1978.63 1978 8
332.46 1978.71 1978 9
332.29 1978.79 1978 10
333.76 1978.88 1978 11
334.80 1978.96 1978 12
336.00 1979.04 1979 1
336.63 1979.13 1979 2
337.93 1979.21 1979 3
338.95 1979.29 1979 4
339.05 1979.38 1979 5
339.27 1979.46 1979 6
337.64 1979.54 1979 7
335.68 1979.63 1979 8
333.77 1979.71 1979 9
334.09 1979.79 1979 10
335.29 1979.88 1979 11
336.76 1979.96 1979 12

337.77 1980.04 1980 1


338.26 1980.13 1980 2
340.10 1980.21 1980 3
340.88 1980.29 1980 4
341.47 1980.38 1980 5
341.31 1980.46 1980 6
339.41 1980.54 1980 7
337.74 1980.63 1980 8
336.07 1980.71 1980 9
336.07 1980.79 1980 10
337.22 1980.88 1980 11
338.38 1980.96 1980 12

339.32 1981.04 1981 1


340.41 1981.13 1981 2
341.69 1981.21 1981 3
342.51 1981.29 1981 4
343.02 1981.38 1981 5
342.54 1981.46 1981 6
340.88 1981.54 1981 7
338.75 1981.63 1981 8
337.05 1981.71 1981 9
337.13 1981.79 1981 10
338.45 1981.88 1981 11
339.85 1981.96 1981 12
340.90 1982.04 1982 1
341.70 1982.13 1982 2
342.70 1982.21 1982 3
343.65 1982.29 1982 4
344.28 1982.38 1982 5
343.42 1982.46 1982 6
342.02 1982.54 1982 7
339.97 1982.63 1982 8
337.84 1982.71 1982 9
338.00 1982.79 1982 10
339.20 1982.88 1982 11
340.63 1982.96 1982 12

341.41 1983.04 1983 1


342.68 1983.13 1983 2
343.04 1983.21 1983 3
345.27 1983.29 1983 4
345.92 1983.38 1983 5
345.40 1983.46 1983 6
344.16 1983.54 1983 7
342.11 1983.63 1983 8
340.11 1983.71 1983 9
340.15 1983.79 1983 10
341.38 1983.88 1983 11
343.02 1983.96 1983 12

343.87 1984.04 1984 1


344.59 1984.13 1984 2
345.11 1984.21 1984 3
347.07 1984.29 1984 4
347.38 1984.38 1984 5
346.78 1984.46 1984 6
344.96 1984.54 1984 7
342.71 1984.63 1984 8
340.86 1984.71 1984 9
341.13 1984.79 1984 10
342.84 1984.88 1984 11
344.32 1984.96 1984 12
344.88 1985.04 1985 1
345.62 1985.13 1985 2
347.23 1985.21 1985 3
347.62 1985.29 1985 4
348.53 1985.38 1985 5
347.87 1985.46 1985 6
346.00 1985.54 1985 7
343.86 1985.63 1985 8
342.55 1985.71 1985 9
342.57 1985.79 1985 10
344.11 1985.88 1985 11
345.49 1985.96 1985 12

346.04 1986.04 1986 1


346.70 1986.13 1986 2
347.38 1986.21 1986 3
349.38 1986.29 1986 4
349.93 1986.38 1986 5
349.26 1986.46 1986 6
347.44 1986.54 1986 7
345.55 1986.63 1986 8
344.21 1986.71 1986 9
343.67 1986.79 1986 10
345.09 1986.88 1986 11
346.27 1986.96 1986 12

347.33 1987.04 1987 1


347.82 1987.13 1987 2
349.29 1987.21 1987 3
350.91 1987.29 1987 4
351.71 1987.38 1987 5
350.94 1987.46 1987 6
349.10 1987.54 1987 7
346.77 1987.63 1987 8
345.73 1987.71 1987 9
Time and Series Forecasting with LSTM- Recurrent Neural Network

Every day, humans make passive predictions when performing tasks such as
crossing a road, where they estimate the speed of cars and their distance from them,
or catching a ball by guessing its velocity and positioning their hands accordingly.
These skills are gained through experience and practice. However, predicting
complex phenomena like the weather or the economy can be difficult due to the
multitude of variables involved. Time and series forecasting is used in such
situations, relying on historical data and mathematical models to make predictions
about future trends and patterns. In this article we will see the example of
forecasting with mathematical concept using airlines dataset.
Part 1: Mathematical Concepts

In the context of the time series forecasting algorithm used in this article, instead of
manually calculating the slope and intercept of the line, the algorithm uses a neural
network with LSTM layers to learn the underlying patterns and relationships in the
time series data. The neural network is trained on a portion of the data and then
used to make predictions for the remaining portion. In this algorithm, the prediction
for the next time step is based on the previous n_inputs time steps, which is similar
to the concept of using y(t) to predict y(T+1) in the linear regression example.
However, instead of using a simple linear equation, the prediction in this algorithm
is generated using the activation function of the LSTM layer. The activation
function allows the model to capture non-linear relationships in the data, making it
more effective in capturing complex patterns in time series data.
Activation Function

photo by @learnwithutsav

The activation function used in the LSTM model is the rectified linear unit (ReLU)
activation function. This activation function is commonly used in deep learning
models because of its simplicity and effectiveness in dealing with the vanishing
gradient problem. In the LSTM model, the ReLU activation function is applied to
the output of each LSTM unit to introduce non-linearity in the model and allow it to
learn complex patterns in the data. The ReLU function has a simple thresholding
behavior where any negative input is mapped to zero and any positive input is
passed through unchanged, making it computationally efficient.

Part 2 : Implementation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('airline-passengers.csv', index_col='Month', parse_dates=True)
df.index.freq = 'MS'
df.shape
df.columns
plt.figure(figsize=(20, 4))
plt.plot(df.Passengers, linewidth=2)
plt.show()

The code imports three important libraries: numpy, pandas, and matplotlib. The
pandas library is used to read in the ‘airline-passengers.csv’ file and set the ‘Month’
column as the index, which allows the data to be analyzed over time. The code then
uses the matplotlib library to create a line plot showing the number of airline
passengers over time. Finally, the plot is displayed using the ‘plt.show’ function.
This code is useful for anyone interested in analyzing time series data, and it
demonstrates how to use pandas and matplotlib to visualize trends in data.

nobs = 12
df_train = df.iloc[:-nobs]
df_test = df.iloc[-nobs:]
df_train.shape
df_test.shape

This code creates two new data frames ‘df_train’ and ‘df_test’ by splitting an
existing time series data frame ‘df’ into training and testing sets. The ‘nobs’
variable is set to 12, which means that the last 12 observations of ‘df’ will be used
for testing, while the rest of the data will be used for training. The training set is
stored in ‘df_train’ and consists of all rows in ‘df’ except for the last 12 rows, while
the testing set is stored in ‘df_test’ and consists of only the last 12 rows of ‘df’. The
‘shape’ attribute is then used to print the number of rows and columns in each data
frame, which confirms that the splitting was done correctly. This code is useful for
preparing time series data for modeling and testing purposes by splitting it into two
sets.
Model Architecture

Photo by @learnwithutsav
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(df_train)
scaled_train = scaler.transform(df_train)
scaled_test = scaler.transform(df_test)
n_inputs = 12
n_features = 1
generator = TimeseriesGenerator(scaled_train, scaled_train, length = n_inputs,
batch_size =1)

for i in range(len(generator)):
X, y = generator[i]
print(f' \n {X.flatten()} and {y}')

This code snippet demonstrates how to use the ‘TimeseriesGenerator’ class from
Keras and the ‘MinMaxScaler’ class from scikit-learn to generate input and output
arrays for a time series forecasting model. The code first creates an instance of the
‘MinMaxScaler’ class and fits it to the training data set (‘df_train’) in order to scale
the data. The scaled data is then stored in ‘scaled_train’ and ‘scaled_test’ data
frames. The number of time steps (‘n_inputs’) is set to 12, and the number of
features (‘n_features’) is set to 1. A ‘TimeseriesGenerator’ object is created with the
‘scaled_train’ data and a window length of ‘n_inputs’ and a batch size of 1. Finally,
a loop is used to iterate over the ‘generator’ object and print out the input and
output arrays for each time step. The ‘X’ and ‘y’ variables represent the input and
output arrays for each time step, respectively. The ‘flatten()’ method is used to
convert the input array into a 1D array for easier printing. Overall, this code is
useful for preparing time series data for forecasting models using a sliding window
approach.
X.shape

This code returns the shape of an array or matrix ‘X’. The ‘shape’ attribute is a
property of NumPy arrays and returns a tuple representing the dimensions of the
array. The code does not provide any additional context, so it is unclear what the
shape of ‘X’ is. The output will be in the format (rows, columns).

from keras.models import Sequential


from keras.layers import Dense
from keras.layers import LSTM

model = Sequential()
model.add(LSTM(200, activation='relu', input_shape = (n_inputs, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

model.summary()

This code demonstrates how to create an LSTM neural network model for time
series forecasting using Keras. Firstly, the necessary Keras classes are imported,
including ‘Sequential’, ‘Dense’, and ‘LSTM’. The model is created as a
‘Sequential’ object and an LSTM layer is added with 200 neurons, the ‘relu’
activation function, and an input shape defined by ‘n_inputs’ and ‘n_features’. The
LSTM layer output is then passed to a ‘Dense’ layer with a single output neuron.
The model is compiled with the ‘adam’ optimizer and the mean squared error
(‘mse’) loss function. The ‘summary()’ method is used to display a summary of the
architecture, including the number of parameters and the shapes of the input and
output tensors for each layer. This code can be useful for creating an LSTM model
for time series forecasting, as it provides an easy-to-follow example that can be
adapted to different data sets and forecasting problems.
Training Phase

model.fit(generator, epochs = 50)

This code trains the LSTM neural network model using the ‘fit()’ method in Keras
for 50 epochs. The ‘TimeseriesGenerator’ object generates batches of input/output
pairs for the model to learn from. The ‘fit()’ method updates the model parameters
using backpropagation based on the loss function and optimizer defined during
model compilation. By training the model, it learns to make predictions on new,
unseen data based on patterns learned in the training data.
plt.plot(model.history.history['loss'])

last_train_batch = scaled_train[-12:]

last_train_batch = last_train_batch.reshape(1, 12, 1)

last_train_batch
model.predict(last_train_batch)

This code uses the trained LSTM neural network model to make predictions on a
new data point. The last 12 data points from the training data are selected, scaled,
and reshaped into the appropriate format for the model. The ‘predict()’ method is
called on the model with the reshaped data as input, and the output is the predicted
value for the next time step in the time series. This is an essential step in using the
LSTM model for time series forecasting.

scaled_test[0]

This code prints the first element of the scaled test data array. The ‘scaled_test’
variable is a NumPy array of the test data that has been transformed using the
‘MinMaxScaler’ object. Printing the first element of this array shows the scaled
value for the first time step in the test data.

Forecasting

y_pred = []

first_batch = scaled_train[-n_inputs:]
current_batch = first_batch.reshape(1, n_inputs, n_features)
for i in range(len(scaled_test)):
batch = current_batch
pred = model.predict(batch)[0]
y_pred.append(pred)
current_batch = np.append(current_batch[:,1:, :], [[pred]], axis = 1)

y_pred

scaled_test

This code generates predictions for the test data using the trained LSTM model. It
uses a for loop to loop over each element in the scaled test data. In each iteration,
the current batch is used to make a prediction using the ‘predict()’ method of the
model. The predicted value is then added to the ‘y_pred’ list and the current batch is
updated. Finally, the ‘y_pred’ list is printed along with the ‘scaled_test’ data to
compare the predicted values with the actual values. This step is crucial in
evaluating the performance of the LSTM model on the test data.
df_test

y_pred_transformed = scaler.inverse_transform(y_pred)

y_pred_transformed = np.round(y_pred_transformed,0)

y_pred_final = y_pred_transformed.astype(int)

y_pred_final

This code transforms the predicted values generated in the previous step back to the
original scale using the ‘inverse_transform()’ method of the scaler object. The
transformed values are rounded to the nearest integer using the ‘round()’ function
and converted to integers using the ‘astype()’ method. The resulting array of
predicted values, ‘y_pred_final’, is printed to show the final predicted values for the
test data. This step is important for evaluating the accuracy of the LSTM model’s
predictions on the original scale of the data.
df_test.values, y_pred_final

df_test['Predictions'] = y_pred_final

df_test

The code above shows the predicted values generated by the LSTM model being
added to the original test dataset. First, the ‘values’ attribute is used to extract the
values of the ‘df_test’ dataframe, which are then paired with the predicted values
‘y_pred_final’. Then, a new column called ‘Predictions’ is added to the ‘df_test’
dataframe to store the predicted values. Finally, the ‘df_test’ dataframe is printed
with the newly added ‘Predictions’ column. This step is important to visually
compare the actual values of the test dataset with the predicted values and evaluate
the accuracy of the model.
plt.figure(figsize=(15, 6))
plt.plot(df_train.index, df_train.Passengers, linewidth=2, color='black', label='Train
Values')
plt.plot(df_test.index, df_test.Passengers, linewidth=2, color='green', label='True
Values')
plt.plot(df_test.index, df_test.Predictions, linewidth=2, color='red', label='Predicted
Values')
plt.legend()
plt.show()

This code block is generating a plot using the matplotlib library. It first sets the
figure size, and then plots the training data as a black line, the true test values as a
green line, and the predicted test values as a red line. It also adds a legend to the
plot and displays it using the show() method.
Mean Squared Error

The mean squared error (MSE) is a measure of how close a regression line is to
a set of points. It’s calculated by taking the average of the squared differences
between the predicted and actual values. The square root of the MSE is known
as the root mean squared error (RMSE), which is a popular measure of the
accuracy of predictions. In this code block, the RMSE is calculated using
the mean_squared_error function from the sklearn.metrics module and
the sqrt function from the math module. The RMSE is used to evaluate the
accuracy of the LSTM model's predictions compared to the true values in the
test set.

from sklearn.metrics import mean_squared_error


from math import sqrt

sqrt(mean_squared_error(df_test.Passengers, df_test.Predictions))

This code calculates the root mean squared error (RMSE) between the actual
passenger values in the test set (df_test.Passengers) and the predicted passenger
values (df_test.Predictions). RMSE is a commonly used metric to evaluate the
performance of regression models. It measures the average distance between the
predicted values and the actual values, taking into account the square of the
differences between them. RMSE is a useful metric because it penalizes large errors
more heavily than small errors, making it a good indicator of the overall accuracy
of a model's predictions.

In conclusion, we have implemented a time series forecasting model using the


LSTM algorithm in Keras. We trained the model on the monthly airline passenger
dataset and used it to make predictions for the next 12 months. The model
performed well with a root mean squared error of 30.5. The visualization of the
true, predicted, and training values showed that the model was able to capture the
overall trend and seasonality in the data. This demonstrates the power of LSTM in
capturing complex temporal relationships in time series data and its potential for
making accurate predictions.
Thanks to Dr. Vishnu Srinivasa Murthy Y sir for his guidance in this blog post. His
expertise were instrumental in helping me understand this complex topic much
better. All the images used belongs to author

Link to data set — https://www.kaggle.com/datasets/utsavpoudel/airline-passengers

Twitter — https://twitter.com/utsavpoudel_

Discord - A New Way to Chat with Friends & Communities


Discord is the easiest way to communicate over voice, video, and text. Chat,
hang out, and stay close with your friends…

Write

Sign up
Sign In

Top highlight

GROKKING THE SYSTEM DESIGN INTERVIEW

12 Microservices Patterns I Wish I Knew Before the System Design Interview


Mastering the Art of Scalable and Resilient Systems with Essential
Microservices Design Patterns

Unleash the Power of Microservices

Are you striving to build efficient, scalable, and resilient software systems? As a
software developer or senior developer, you must have come across the term
“microservices architecture.” This revolutionary approach to software development
has been adopted by many successful tech giants, such as Netflix, Amazon, and
Spotify. But, what exactly are microservices, and why should you care?
Microservices architecture is a software development technique that breaks down a
large application into smaller, manageable, and independent services. Each service
is responsible for a specific functionality and communicates with others through
well-defined APIs. This approach helps in achieving better scalability,
maintainability, and flexibility of software systems.

Did you know that 86% of developers reported increased productivity and faster
time to market when they embraced microservices? The secret behind this success
lies in understanding and implementing the right microservices patterns. These
patterns provide a solid foundation for designing and managing microservices-
based applications.

In this blog, we will dive into the top 12 microservices patterns that every software
engineer must know. By mastering these patterns, you will be well-equipped to
build powerful, fault-tolerant, and easily maintainable software systems. Are you
ready to level up your software development game? Let’s get started!

1. API Gateway Pattern: Your One-Stop-Shop for Microservices

Are you tired of managing multiple entry points for your microservices? The API
Gateway pattern is here to save the day! Acting as a single entry point for all client
requests, the API Gateway simplifies access to your microservices, offering
seamless communication between clients and services.

Why should you care about the API Gateway? First, it helps in aggregating
responses from multiple microservices, reducing the number of round trips between
clients and services. This results in improved performance and user experience.
Second, it enables you to implement cross-cutting concerns such as authentication,
logging, and rate limiting at a single place, promoting consistency and reducing
redundancy.

Imagine the convenience of having a central hub that takes care of all these
responsibilities! According to a study by RapidAPI, 68% of developers who
adopted API Gateway reported improved security and simplified management of
their microservices.

Some popular API Gateway solutions include Amazon API Gateway, Kong, and
Azure API Management. These tools provide a range of features, such as caching,
throttling, and monitoring, to help you manage your microservices efficiently.

In short, the API Gateway pattern is an essential component of a successful


microservices architecture. By embracing this pattern, you can ensure streamlined
communication, enhanced security, and simplified management of your services.
Are you ready to unlock the true potential of microservices with the API Gateway
pattern?
API Gateway Pattern

2. Service Discovery Pattern: Navigating the Microservices Maze with Ease

Are you struggling to keep track of your growing number of microservices? Worry
no more! The Service Discovery pattern is here to help you navigate the complex
world of microservices with ease. This pattern allows services to find each other
dynamically, ensuring smooth communication and reducing the need for manual
configuration.

Why is Service Discovery crucial for your microservices architecture? As your


system scales, managing the ever-changing service locations becomes increasingly
challenging. With Service Discovery, services can automatically register and
discover each other, promoting agility and flexibility in your system. In fact, 74%
of developers who adopted Service Discovery reported increased efficiency in
managing their microservices.
Take a look at Grokking Microservices Design Patterns to master these
microservices design patterns for designing scalable, resilient, and more
manageable systems.

Service Discovery can be achieved through two main approaches: client-side


discovery and server-side discovery. Client-side discovery involves the client
querying a service registry to find the target service’s location, while server-side
discovery relies on a load balancer to route requests to the appropriate service.
Tools like Netflix Eureka, Consul, and Kubernetes offer built-in Service Discovery
solutions to cater to your specific needs.

In a nutshell, the Service Discovery pattern plays a pivotal role in maintaining a


robust and adaptable microservices architecture. By implementing this pattern, you
can easily manage and scale your services without breaking a sweat. Are you
prepared to conquer the microservices maze with Service Discovery?
Circuit Breaker

3. Circuit Breaker Pattern: Shield Your Microservices from Cascading Failures

Are you concerned about the ripple effect of failures in your microservices
architecture? Meet the Circuit Breaker pattern — your ultimate safeguard against
cascading failures. This pattern monitors for failures and prevents requests from
reaching a failing service, giving it time to recover and protecting the entire system
from collapse.

Why should you implement the Circuit Breaker pattern? In a microservices


ecosystem, a single malfunctioning service can cause a domino effect, disrupting
other services that depend on it. By using Circuit Breakers, you can isolate the
faulty service and prevent further damage, ensuring the resiliency and stability of
your system. A survey revealed that 77% of developers who utilized the Circuit
Breaker pattern experienced a significant reduction in downtime.

Circuit Breakers can be easily implemented using libraries like Netflix Hystrix and
Resilience4j. These libraries offer a range of features, such as fallback methods and
monitoring, to help you manage and recover from failures efficiently.

In essence, the Circuit Breaker pattern is a must-have for building resilient and
fault-tolerant microservices. By incorporating this pattern into your architecture,
you can effectively shield your system from the adverse effects of service failures.
Are you ready to fortify your microservices with the Circuit Breaker pattern?
The System Design Interview Roadmap
Decoding the Secrets of Successful System Design Interviews. This guide is a
comprehensive resource that prepares…
www.designgurus.io

4. Load Balancing Pattern: Distribute Traffic Efficiently for High-Performance


Microservices

Are you struggling to handle the increasing traffic in your microservices


ecosystem? Introducing the Load Balancing pattern — the key to distributing traffic
evenly across your services, ensuring optimal performance, and preventing service
overload.

Why should you consider the Load Balancing pattern? As your application grows,
uneven traffic distribution can lead to service degradation or even failure. Load
Balancing ensures that no single service becomes a bottleneck, resulting in
improved performance and reliability. In fact, 81% of developers who adopted
Load Balancing reported enhanced application responsiveness and reduced service
downtime.

Load Balancing can be achieved through various algorithms, such as round-robin,


least connections, and weighted round-robin. Each algorithm has its advantages and
use cases, making it crucial to choose the right one for your system. Tools like
NGINX and HAProxy offer powerful Load Balancing solutions, allowing you to
fine-tune your traffic distribution strategy.

In summary, the Load Balancing pattern is a vital component of a robust


microservices architecture. By implementing this pattern, you can effectively
manage traffic and ensure high-performance, scalable, and fault-tolerant services.
Are you prepared to elevate your microservices’ performance with Load Balancing?

Load Balancer vs. API Gateway

5. Bulkhead Pattern: Fortify Your Microservices with Advanced Fault Isolation

Are you seeking ways to minimize the impact of service failures in your
microservices architecture? Look no further than the Bulkhead pattern! This pattern
isolates services and resources, ensuring that a failure in one service doesn’t bring
down your entire system.

Why is the Bulkhead pattern essential for your microservices? In a complex


ecosystem, it’s crucial to prevent the domino effect of failures. By implementing
Bulkheads, you can compartmentalize your services, ensuring that a malfunction in
one area doesn’t cascade throughout the system. A study found that 73% of
developers who adopted the Bulkhead pattern experienced a significant reduction in
the impact of service failures on their applications.

Designing and implementing Bulkheads involves creating dedicated resources for


each service, such as separate thread pools or database connections. This way, even
if one service exhausts its resources, other services remain unaffected. Real-life
examples of Bulkhead implementation include the AWS Lambda function resource
allocation and connection pooling in databases.

In a nutshell, the Bulkhead pattern offers an advanced level of fault isolation,


making it a critical component of resilient microservices architecture. By embracing
this pattern, you can effectively minimize the impact of service failures and ensure
the stability of your system. Are you ready to fortify your microservices with the
Bulkhead pattern?

System Design Interview Survival Guide (2023): Preparation Strategies and


Practical Tips
System Design Interview Preparation: Mastering the Art of System Design.
levelup.gitconnected.com

6. CQRS Pattern: Boost Your Microservices Performance with Separation of


Concerns

Are you looking for ways to optimize the performance and scalability of your
microservices? The CQRS (Command Query Responsibility Segregation) pattern is
the answer! This pattern separates the read and write operations of your services,
allowing you to fine-tune each aspect independently for maximum efficiency.

Why should you consider the CQRS pattern? In traditional architectures, combining
read and write operations can lead to performance bottlenecks and increased
complexity. With CQRS, you can optimize each operation individually, resulting in
improved performance and easier maintenance. Studies show that 78% of
developers who adopted CQRS experienced enhanced system scalability and
responsiveness.

Implementing CQRS involves segregating your services into two distinct parts: one
for handling commands (write operations) and another for handling queries (read
operations). This separation allows you to apply different scaling, caching, and
database strategies for each operation type. Popular frameworks, such as Axon and
MediatR, offer built-in support for implementing the CQRS pattern.

In summary, the CQRS pattern is an effective approach to optimizing the


performance and scalability of your microservices. By embracing this pattern, you
can efficiently manage your read and write operations, ensuring a highly responsive
and maintainable system. Are you prepared to take your microservices performance
to new heights with CQRS?

7. Event-Driven Architecture Pattern: Empower Your Microservices with Real-


Time Responsiveness

Are you searching for a way to enhance the responsiveness and adaptability of your
microservices? The Event-Driven Architecture pattern is here to help! This pattern
leverages events to trigger actions in your services, enabling real-time
responsiveness and promoting loose coupling between services.

Why is the Event-Driven Architecture pattern a game-changer? By utilizing events


as triggers, you can minimize direct dependencies between services, allowing for
increased flexibility and easier system evolution. Research shows that 80% of
developers who adopted this pattern experienced improved scalability and
adaptability in their microservices.

Design Gurus has most comprehensive list of courses on system design and coding
interviews. Take a look at Grokking Microservices Design Patterns to master
microservices design patterns.

Examples of event-driven systems include real-time notifications, data streaming,


and IoT applications. Popular tools, such as Apache Kafka, RabbitMQ, and
Amazon Kinesis, enable you to implement this pattern effectively in your
microservices architecture.

In essence, the Event-Driven Architecture pattern offers a powerful way to enhance


the responsiveness, flexibility, and scalability of your microservices. By
incorporating this pattern, you can create a dynamic system that adapts to changes
in real-time. Are you ready to unlock the full potential of your microservices with
Event-Driven Architecture?
Message Broker

8. Saga Pattern: Tackle Distributed Transactions with Confidence

Are you concerned about managing transactions across multiple microservices?


Fear not! The Saga pattern offers a reliable solution for handling distributed
transactions, ensuring data consistency while maintaining the autonomy of your
services.

Why should you consider the Saga pattern? In a microservices architecture,


transactions often span across multiple services, making traditional ACID
transactions unsuitable. The Saga pattern provides a way to manage these complex
scenarios while preserving the benefits of microservices. Studies indicate that 76%
of developers who implemented the Saga pattern experienced improved data
consistency and reduced transaction complexity.

Implementing the Saga pattern involves breaking down a distributed transaction


into a series of local transactions, each followed by an event or a message. If a local
transaction fails, compensating transactions are executed to undo the completed
steps, maintaining data consistency. Tools like Eventuate and Axon provide built-in
support for implementing the Saga pattern in your microservices architecture.

In summary, the Saga pattern is an indispensable tool for managing distributed


transactions in a microservices ecosystem. By adopting this pattern, you can ensure
data consistency and reduce transaction complexity while preserving the autonomy
of your services.

Demystifying System Design Interviews: A Guide


Struggling with system design interviews? Unlock expert guidance, strategies,
and practice questions to ace your…
www.designgurus.io

9. Retry Pattern: Boost Your Microservices Resilience with Graceful Error


Recovery

Are you seeking ways to improve your microservices’ resilience in the face of
transient failures? The Retry pattern has got you covered!

This pattern involves automatically retrying a failed operation, increasing the


chances of successful execution and minimizing the impact of temporary issues.

Why should you adopt the Retry pattern? In a microservices ecosystem, transient
failures such as network hiccups or service timeouts are inevitable. The Retry
pattern enables your services to recover gracefully from these issues, enhancing
overall system stability.
The key to successful implementation lies in defining a suitable retry strategy. This
strategy should include factors like the maximum number of retries, delay between
retries, and any exponential backoff. Libraries like Polly, Resilience4j, and Spring
Retry offer built-in support for implementing the Retry pattern in your
microservices.

In a nutshell, the Retry pattern is an essential ingredient for building resilient


microservices that can effectively recover from transient failures. By embracing this
pattern, you can ensure a more stable and reliable system in the face of temporary
issues.

10. Backends for Frontends Pattern (BFF): Optimize User Experience with
Tailored Service Aggregation

Are you looking to deliver a seamless user experience across multiple platforms?
Look no further than the Backends for Frontends (BFF) pattern! This pattern
involves creating dedicated backend services for each frontend, ensuring optimal
performance and user experience tailored to each platform.

Why should you consider the BFF pattern? In a microservices architecture, a single
backend service might not cater to the diverse requirements of different frontends.
The BFF pattern enables you to customize your backend services for each platform,
enhancing performance and user experience. A study found that 82% of developers
who adopted the BFF pattern reported improved user satisfaction and reduced
development complexity.
To implement the BFF pattern, you create separate backend services for each
frontend (e.g., web, mobile, IoT), aggregating and adapting the data specifically for
each platform’s requirements. Tools like GraphQL, Apollo Server, and Express.js
can facilitate the creation of custom backend services for your frontends.

In conclusion, the BFF pattern is a powerful approach to optimizing the user


experience across multiple platforms in a microservices ecosystem. By adopting
this pattern, you can tailor your services to each platform’s needs, ensuring top-
notch performance and user satisfaction. Are you ready to optimize your user
experience with the BFF pattern?

BFF Pattern

11. Sidecar Pattern: Supercharge Your Microservices with Modular


Functionality
Do you want to extend your microservices’ functionality without compromising
their autonomy? The Sidecar pattern is your answer! This pattern allows you to
attach additional components to your services, providing modular functionality
without altering the core service itself.

Why should you adopt the Sidecar pattern? In a microservices architecture,


maintaining service independence is crucial. The Sidecar pattern enables you to add
new features or cross-cutting concerns without affecting the main service,
preserving modularity and maintainability. Research shows that 77% of developers
who implemented the Sidecar pattern experienced increased agility and reduced
development complexity.

Implementing the Sidecar pattern involves deploying a separate container alongside


your main service container. This “sidecar” container handles specific tasks such as
logging, monitoring, or security, allowing your main service to focus on its core
functionality. Examples of Sidecar implementation include the Envoy proxy in a
service mesh and the Fluentd logging sidecar.

In summary, the Sidecar pattern is an effective way to extend your microservices’


functionality while preserving their modularity and independence. By embracing
this pattern, you can enhance your services with ease, ensuring a scalable and
maintainable system. Are you ready to supercharge your microservices with the
Sidecar pattern?

Consistency Patterns in Distributed Systems: A Complete Guide


What exactly are distributed systems? A distributed system, in simple terms,
is a network of computers that work…
www.designgurus.io
12. Strangler Pattern: Transform Your Monolith into Microservices with
Confidence

Are you planning to migrate from a monolithic architecture to microservices but


unsure where to start? The Strangler pattern is here to guide you! This pattern
enables you to gradually replace your monolithic system with microservices,
ensuring a smooth and risk-free transition.

Why should you adopt the Strangler pattern? Migrating from a monolithic
architecture to microservices can be challenging and risky. The Strangler pattern
allows for incremental replacement, minimizing downtime and risk while
maintaining business continuity. Studies reveal that 81% of developers who used
the Strangler pattern experienced a smoother migration with fewer issues.

To implement the Strangler pattern, you start by identifying a specific functionality


within your monolithic system. You then create a new microservice to handle that
functionality and redirect requests to the new service using an API gateway or
proxy. Over time, you repeat this process for other functionalities until the entire
monolith is replaced with microservices.

In short, the Strangler pattern is an invaluable tool for transforming your monolithic
system into a microservices architecture with confidence. By following this pattern,
you can ensure a smooth and risk-free migration, setting your organization up for
success in the microservices era. Are you ready to embrace the Strangler pattern
and revolutionize your architecture?
Conclusion: Unlock the Full Potential of Your Microservices with These Top
Patterns

In today’s fast-paced software development landscape, the need for scalable,


maintainable, and resilient systems is paramount. By mastering these top 12
microservices patterns, you can harness the full potential of your microservices
architecture, ensuring success in the ever-evolving world of software engineering.

Why are these patterns essential? Research shows that developers who implement
these patterns experience improved system performance, scalability, and
maintainability. By leveraging these patterns, you can tackle complex challenges
like distributed transactions, service resilience, and user experience optimization
with confidence.

As a software engineer, staying ahead of the curve is crucial for your professional
growth. These patterns provide you with the essential tools to excel in the
microservices domain, setting you apart from your peers and enabling you to
deliver outstanding results.

In summary, embracing these top 12 microservices patterns is your key to


unlocking the full potential of your microservices architecture. Are you prepared to
take your software engineering skills to the next level and lead the charge in
microservices innovation?

Take a look at Grokking Microservices Design Patterns to master these


microservices design patterns for designing scalable, resilient, and more
manageable systems.
16 System Design Concepts I Wish I Knew Before the Interview.

Mastering System Design Interview: Essential Concepts for Every Software


Engineer

Arslan Ahmad

Follow
Published in

Level Up Coding

·
13 min read
·
Apr 3
2.9K
17
System Design Master Template

To excel in system design, one of the most crucial aspects is to develop a deep
understanding of fundamental system design concepts such as Load
Balancing, Caching, Partitioning, Replication, Databases, and Proxies.

Through my own experiences, I’ve identified 16 key concepts that can make a
significant difference in your ability to tackle system design problems. These
concepts range from understanding the intricacies of API gateway and mastering
load-balancing techniques to grasping the importance of CDNs and appreciating the
role of caching in modern distributed systems. By the end of this blog, you’ll have a
comprehensive understanding of these essential ideas and the confidence to apply
them in your next interview.

System design interviews are unstructured by nature. During the interview, it is


difficult to keep track of things and be sure that you have touched upon all the
essential aspects of the design. To simplify this process, I have developed a system
design master template that should guide you in answering any system design
interview question. Take a look at the featured image to gain insight into the key
components that may be involved in any system design.

Keeping this master template in mind, we will discuss the 16 essential system
design concepts. Here is their brief description:

1. Domain Name System (DNS)


2. Load Balancer
3. API Gateway
4. CDN
5. Forward Proxy vs. Reverse Proxy
6. Caching
7. Data Partitioning
8. Database Replication
9. Distributed Messaging Systems
10. Microservices
11. NoSQL Databases
12. Database Index
13. Distributed File Systems
14. Notification System
15. Full-text Search
16. Distributed Coordination Services

1. Domain Name System (DNS)

Domain Name System (DNS) is a fundamental component of the internet


infrastructure that translates human-friendly domain names into their corresponding
IP addresses. It functions like a phonebook for the internet, allowing users to access
websites and services by typing in easily memorable domain names, such
as www.designgurus.iorather than the numerical IP addresses like “192.0.2.1” that
computers use to identify each other.

When you enter a domain name into your web browser, the DNS is responsible for
locating the associated IP address and directing your request to the correct server.
The process begins with your computer sending a query to a recursive resolver,
which then searches a series of DNS servers, starting with the root server, followed
by the Top-Level Domain (TLD) server, and finally the authoritative name server.
Once the IP address is found, the recursive resolver returns it to your computer,
allowing your browser to establish a connection with the target server and access
the desired content.

DNS Resolver

2. Load Balancer

A load balancer is a networking device or software that distributes incoming


network traffic across multiple servers to ensure optimal resource utilization, reduce
latency, and maintain high availability. It plays a vital role in scaling applications
and managing server workloads efficiently, especially in situations where there is a
sudden spike in traffic or uneven distribution of requests among servers.
Load balancers use different algorithms to determine how to distribute incoming
traffic. Common algorithms include:

1. Round Robin: Requests are distributed sequentially and evenly across


all available servers in a cyclical manner.

2. Least Connections: The load balancer assigns requests to the server with
the fewest active connections, prioritizing less-busy servers.

3. IP Hash: The client’s IP address is hashed, and the resulting value is


used to determine which server the request should be directed to. This
method ensures that a specific client’s requests are always routed to the
same server, helping maintain session persistence.

Load Balancer

3. API Gateway
An API Gateway is a server or service that acts as an intermediary between external
clients and the internal microservices or API-based backend services of an
application. It is a crucial component in modern architectures, especially in
microservices-based systems, where it simplifies the communication process and
provides a single entry point for clients to access various services.

The main functions of an API Gateway include:

1. Request Routing: It directs incoming API requests from clients to the


appropriate backend service or microservice, based on predefined rules
and configurations.

2. Authentication and Authorization: The API Gateway can handle user


authentication and authorization, ensuring that only authorized clients
can access the services. It can verify API keys, tokens, or other
credentials before routing requests to the backend services.

3. Rate Limiting and Throttling: To protect backend services from


excessive load or abuse, the API Gateway can enforce rate limits or
throttle requests from clients based on predefined policies.

4. Caching: To reduce latency and backend load, the API Gateway can
cache frequently-used responses, serving them directly to clients without
the need to query the backend services.

5. Request and Response Transformation: The API Gateway can modify


requests and responses, such as converting data formats, adding or
removing headers, or modifying query parameters, to ensure
compatibility between clients and services.

API Gateway

Check Grokking the System Design Interview for a list of common system design
interview questions and basic concepts.

4. CDN
A Content Delivery Network (CDN) is a distributed network of servers that store
and deliver content, such as images, videos, stylesheets, and scripts, to users from
geographically closer locations. CDNs are designed to improve the performance,
speed, and reliability of content delivery to end-users, regardless of their location
relative to the origin server.

Here’s how a CDN works:

1. When a user requests content from a website or application, the request is


directed to the nearest CDN server, also known as an edge server.

2. If the edge server has the requested content cached, it directly serves the
content to the user. This reduces latency and improves the user
experience, as the content travels a shorter distance.

3. If the content is not cached on the edge server, the CDN retrieves it from
the origin server or another nearby CDN server. Once the content is
fetched, it is cached on the edge server and served to the user.

4. To ensure the content remains up-to-date, the CDN periodically checks


the origin server for changes and updates its cache accordingly.
5. Forward Proxy vs. Reverse Proxy

A forward proxy, also known as a “proxy server,” or simply “proxy,” is a server


that sits in front of one or more client machines and acts as an intermediary between
the clients and the internet. When a client machine makes a request to a resource on
the internet, the request is first sent to the forward proxy. The forward proxy then
forwards the request to the internet on behalf of the client machine and returns the
response to the client machine.

A reverse proxy is a server that sits in front of one or more web servers and acts as
an intermediary between the web servers and the Internet. When a client makes a
request to a resource on the internet, the request is first sent to the reverse proxy.
The reverse proxy then forwards the request to one of the web servers, which
returns the response to the reverse proxy. The reverse proxy then returns the
response to the client.

Forward Proxy vs. Reverse Proxy


Check Grokking the Advanced System Design Interview for architectural reviews
of famous distributed systems.

6. Caching

The cache is a high-speed storage layer that sits between the application and the
original source of the data, such as a database, a file system, or a remote web
service. When data is requested by the application, it is first checked in the cache. If
the data is found in the cache, it is returned to the application. If the data is not
found in the cache, it is retrieved from its original source, stored in the cache for
future use, and returned to the application. In a distributed system, caching can be
done at multiple places for example, Client, DNS, CDN, Load Balancer, API
Gateway, Server, Database, etc.
What are where to cache

7. Data Partitioning

In a database, horizontal partitioning, also known as sharding, involves dividing


the rows of a table into smaller tables and storing them on different servers or
database instances. This is done to distribute the load of a database across multiple
servers and to improve performance.
On the other hand, vertical partitioning, involves dividing the columns of a table
into separate tables. This is done to reduce the number of columns in a table and to
improve the performance of queries that only access a small number of columns.

Data partitioning

8. Database Replication

Database replication is a technique used to maintain multiple copies of the same


database across different servers or locations. The primary purpose of database
replication is to improve data availability, redundancy, and fault tolerance, ensuring
that the system continues to function even in the case of hardware failures or other
issues.
In a replicated database setup, one server acts as the primary (or master) database,
while others function as replicas (or slaves). The process involves synchronizing
data between the primary database and replicas, so they all have the same up-to-
date information. Database replication offers several benefits, including:

1. Improved Performance: By distributing read queries among multiple


replicas, you can reduce the load on the primary database and improve
query response times.

2. High Availability: In the event of a failure or downtime on the primary


database, replicas can continue to serve data, ensuring uninterrupted
access to the application.

3. Enhanced Data Protection: Having multiple copies of the database


across different locations helps protect against data loss due to hardware
failures or other disasters.

4. Load Balancing: Replicas can handle read queries, which allows for
better load distribution and reduces the overall strain on the primary
database.

Demystifying System Design Interviews: A Guide


Struggling with system design interviews? Unlock expert guidance, strategies,
and practice questions to ace your…
www.designgurus.io

9. Distributed Messaging Systems


Distributed messaging systems enable the exchange of messages between multiple,
potentially geographically-dispersed applications, services, or components in a
reliable, scalable, and fault-tolerant manner. They facilitate communication by
decoupling the sender and receiver components, allowing them to evolve and
operate independently. Distributed messaging systems are particularly useful in
large-scale or complex systems, such as those found in microservices architectures
or distributed computing environments. Examples of such systems are Apache
Kafka and RabbitMQ.

10. Microservices

Microservices are an architectural style in which an application is structured as a


collection of small, loosely-coupled, and independently deployable services. Each
microservice is responsible for a specific piece of functionality or domain within
the application, and communicates with other microservices through well-defined
APIs. This approach is a departure from the traditional monolithic architecture,
where an application is built as a single, tightly-coupled unit.

The main characteristics of microservices are:

1. Single Responsibility: Each microservice focuses on a specific


functionality or domain, adhering to the Single Responsibility Principle.
This makes the services easier to understand, develop, and maintain.

2. Independence: Microservices can be developed, deployed, and scaled


independently of one another. This allows for increased flexibility and
agility in the development process, as teams can work on different
services concurrently without impacting the entire system.
3. Decentralized: Microservices are typically decentralized, with each
service owning its data and business logic. This encourages separation of
concerns and enables teams to make decisions and choose technologies
that best suit their specific requirements.

4. Communication: Microservices communicate with each other using


lightweight protocols such as HTTP/REST, gRPC, or message queues.
This promotes interoperability and makes it easier to integrate new
services or replace existing ones.

5. Fault Tolerance: Since microservices are independent, a failure in one


service does not necessarily cause the entire system to fail. This can help
improve the overall resiliency of the application.

11. NoSQL Databases

NoSQL databases, or “Not Only SQL” databases, are non-relational databases


designed to store, manage, and retrieve unstructured or semi-structured data. They
offer an alternative to traditional relational databases, which rely on structured data
and predefined schemas. NoSQL databases have become popular due to their
flexibility, scalability, and ability to handle large volumes of data, making them
well-suited for modern applications, big data processing, and real-time analytics.

NoSQL databases can be categorized into four main types:

1. Document-Based: These databases store data in document-like


structures, such as JSON or BSON. Each document is self-contained and
can have its own unique structure, making them suitable for handling
heterogeneous data. Examples of document-based NoSQL databases
include MongoDB and Couchbase.

2. Key-Value: These databases store data as key-value pairs, where the key
acts as a unique identifier, and the value holds the associated data. Key-
value databases are highly efficient for simple read and write operations,
and they can be easily partitioned and scaled horizontally. Examples of
key-value NoSQL databases include Redis and Amazon DynamoDB.

3. Column-Family: These databases store data in column families, which


are groups of related columns. They are designed to handle write-heavy
workloads and are highly efficient for querying data with a known row
and column keys. Examples of column-family NoSQL databases include
Apache Cassandra and HBase.

4. Graph-Based: These databases are designed for storing and querying


data that has complex relationships and interconnected structures, such as
social networks or recommendation systems. Graph databases use nodes,
edges, and properties to represent and store data, making it easier to
perform complex traversals and relationship-based queries. Examples of
graph-based NoSQL databases include Neo4j and Amazon Neptune.

Types of NoSQL databases

12. Database Index


Database indexes are data structures that improve the speed and efficiency of
querying operations in a database. They work similarly to an index in a book,
allowing the database management system (DBMS) to quickly locate the data
associated with a specific value or set of values, without having to search through
every row in a table. By providing a more direct path to the desired data, indexes
can significantly reduce the time it takes to retrieve information from a database.

Indexes are usually built on one or more columns of a database table. The most
common type of index is the B-tree index, which organizes data in a hierarchical
tree structure, allowing for fast search, insertion, and deletion operations. There are
other types of indexes, such as bitmap indexes and hash indexes, each with their
specific use cases and advantages.

While indexes can significantly improve query performance, they also have some
trade-offs:

1. Storage Space: Indexes consume additional storage space, as they create


and maintain separate data structures alongside the original table data.

2. Write Performance: When data is inserted, updated, or deleted in a


table, the associated indexes must also be updated, which can slow down
write operations.
Database Index

13. Distributed File Systems

Distributed file systems are storage solutions designed to manage and provide
access to files and directories across multiple servers, nodes, or machines, often
distributed over a network. They enable users and applications to access and
manipulate files as if they were stored on a local file system, even though the actual
files might be physically stored on multiple remote servers. Distributed file systems
are often used in large-scale or distributed computing environments to provide fault
tolerance, high availability, and improved performance.

14. Notification System

These are used to send notifications or alerts to users, such as emails, push
notifications, or text messages.

15. Full-text Search

Full-text search enables users to search for specific words or phrases within an app
or website. When a user queries, the app or website returns the most relevant
results. To do this quickly and efficiently, full-text search relies on an inverted
index, which is a data structure that maps words or phrases to the documents in
which they appear. An example of such systems is Elastic Search.

16. Distributed Coordination Services

Distributed coordination services are systems designed to manage and coordinate


the activities of distributed applications, services, or nodes in a reliable, efficient,
and fault-tolerant manner. They help maintain consistency, handle distributed
synchronization, and manage the configuration and state of various components in a
distributed environment. Distributed coordination services are particularly useful in
large-scale or complex systems, such as those found in microservices architectures,
distributed computing environments, or clustered databases. Examples of such
service are Apache ZooKeeper, etcd, Consul.

Conclusion

Maximize your chances of acing system design interviews by using the


aforementioned system design concepts and the template. Here is a list of
common system design interview questions:

1. Designing a File-sharing Service Like Google Drive or Dropbox.

2. Designing a Video Streaming Platform

3. Designing a URL Shortening Service

4. Designing a Web Crawler

5. Designing Uber

6. Designing Facebook Messenger

7. Designing Twitter Search

Take a look at Grokking the System Design Interview for a detailed discussion of
such system design interview questions.
Check Grokking System Design Fundamentals for a list of common system
design concepts.

https://www.designgurus.io/blog/system-design-interview-fundamentals

To learn software architecture and practice advanced system design interview


questions take a look at Grokking the Advanced System Design Interview.

METHODS OF TIME SERIES TIME SERIES Time series is set of data collected
and arranged in accordance of time. According to Croxton and Cowdon,”A Time
series consists of data arranged chronologically.” Such data may be series of
temperature of patients, series showing number of suicides in different months of
year etc. The analysis of time series means separating out different components
which influences values of series. The variations in the time series can be divided
into two parts: long term variations and short term variations.Long term variations
can be divided into two parts: Trend or Secular Trend and Cyclical variations.
Short term variations can be divided into two parts: Seasonal variations and
Irregular Variations. METHODS FOR TIME SERIES ANALYSIS In business
forecasting, it is important to analyze the characteristic movements of variations in
the given time series. The following methods serve as a tool for this analysis: 1.
Methods for Measurement of Secular Trend i. Freehand curve Method (Graphical
Method) 1 ii. Method of selected points iii.Method of semi-averages iv.Method of
moving averages v. Method of Least Squares 2. Methods for Measurement of
Seasonal Variations i. Method of Simple Average ii. Ratio to Trend Method
iii.Ratio to Moving Average Method iv.Method of Link Relatives 3. Methods for
Measurement for Cyclical Variations 4.Methods for Measurement for Irregular
Variations METHODS FOR MEASUREMENT OF SECULAR TREND The
following are the principal methods of measuring trend from given time series: 1.
GRAPHICAL OR FREE HAND CURVE METHOD 2 This is the simple method
of studying trend. In this method the given time series data are plotted on graph
paper by taking time on X-axis and the other variable on Y-axis. The graph
obtained will be irregular as it would include short-run oscillations. We may
observe the up and down movement of the curve and if a smooth freehand curve is
drawn passing approximately to all points of a curve previously drawn, it would
eliminate the short-run oscillations (seasonal, cyclical and irregular variations) and
show the long-period general tendency of the data. This is exactly what is meant by
Trend. However, It is very difficult to draw a freehand smooth curve and different
persons are likely to draw different curves from the same data. The following
points must be kept in mind in drawing a freehand smooth curve: 1. That the curve
is smooth. 2. That the numbers of points above the line or curve are equal to the
points below it. 3. That the sum of vertical deviations of the points above the
smoothed line is equal to the sum of the vertical deviations of the points below the
line. In this way the positive deviations will cancel the negative deviations. These
deviations are the effects of seasonal cyclical and irregular variations and by this
process they are eliminated. 4. The sum of the squares of the vertical deviations
from the trend line curve is minimum. (This is one of the characteristics of the
trend line fitted by the method of lest squares ) 3 The trend values can be read for
various time periods by locating them on the trend line against each time period.
The following example will illustrate the fitting of a freehand curve to set of time
series values: Example: The table below shows the data of sale of nine years:- Year
1 990 1 991 1 992 1 993 1 994 1 995 1 996 1 997 1 998 Sales in (lakh units) 65 95
115 63 120 100 150 135 172 If we draw a graph taking year on x-axis and sales on
yaxis, it will be irregular as shown below. Now drawing a freehand curve passing
approximately through all this points will represent trend line (shown below by
black line). 4 MERITS: 1. It is simple method of estimating trend which requires
no mathematical calculations. 2. It is a flexible method as compared to rigid
mathematical trends and, therefore, a better representative of the trend of the data.
3. This method can be used even if trend is not linear. 4. If the observations are
relatively stable, the trend can easily be approximated by this method. 5. Being a
non mathematical method, it can be applied even by a common man. DEMERITS:
1. It is subjective method. The values of trend, obtained by different statisticians
would be different and hence, not reliable. 5 2. Predictions made on the basis of
this method are of little value. 2. METHOD OF SELECTED POINTS In this
method, two points considered to be the most representative or normal, are joined
by straight line to get secular trend. This, again, is a subjective method since
different persons may have different opinions regarding the representative points.
Further, only linear trend can be determined by this method. 3. METHOD OF
SEMI-AVERAGES Under this method, as the name itself suggests semiaverages
are calculated to find out the trend values. By semi-averages is meant the averages
of the two halves of a series. In this method, thus, the given series is divided into
two equal parts (halves) and the arithmetic mean of the values of each part (half) is
calculated. The computed means are termed as semi-averages. Each semi-average
is paired with the centre of time period of its part. The two pairs are then plotted on
a graph paper and the points are joined by a straight line to get the trend. It should
be noted that if the data is for even number of years, it can be easily divided into
two halves. But if it is for odd number of years, we leave the middle year of the
time series and two halves constitute the periods on each side of the middle year. 6
MERITS: 1. It is simple method of measuring trend. 2. It is an objective method
because anyone applying this to a given data would get identical trend value.
DEMERITS: 1. This method can give only linear trend of the data irrespective of
whether it exists or not. 2. This is only a crude method of measuring trend, since
we do not know whether the effects of other components are completely eliminated
or not. 4. METHOD OF MOVING AVERAGE This method is based on the
principle that the total effect of periodic variations at different points of time in its
cycle gets completely neutralized, i.e. ∑St = 0 in one year and ∑Ct = 0 in the
periods of cyclical variations. In the method of moving average, successive
arithmetic averages are computed from overlapping groups of successive values of
a time series. Each group includes all the observations in a given time interval,
termed as the period of moving average. The next group is obtained by replacing
the oldest value by the next value in the series. The averages of such groups are
known as the moving averages. The moving averages of a group are always shown
at the centre of its period. The process of computing moving averages smoothens
out the fluctuations in the time series data. It 7 can be shown that if the trend is
linear and the oscillatory variations are regular, the moving average with the period
equal to the period of oscillatory variations would get minimized because the
average of a number of observations must lie between the smallest and the largest
observation. It should be noted that the larger is the period of moving average the
more would be the reduction in the effect of random components but the more
information is lost at the two ends of data.i.e. it reduces the curvature of curvi-
linear trends. When the trend is non linear, the moving averages would give biased
rather than the actual trend values. Suppose that the successive observations are
taken at equal intervals of time, say, yearly are Y1, Y2, Y3 , . . . Moving Average
when the period is Odd Now by a three-yearly moving averages, we shall obtain
average of first three consecutive years (beginning with the second year) and place
it against time t=2; then the average of the next three consecutive years (beginning
with the second year) and place it against time t=3, and so on. This is illustrated
below: T ime (t) Observati ons Yt Moving Total Moving Average (3 Years) 1 Y1 2
Y2 Y1 + Y2 + Y3 ⅓ (Y1 + Y2 + Y3) 3 Y3 Y2 + Y3 + ⅓ (Y2 + Y3 + 8 Y4 Y4) 4
Y4 Y3 + Y4 + Y5 ⅓ (Y3 + Y4 + Y5) 5 Y5 It should be noted that for odd period
moving average, it is not possible to get the moving averages for the first and the
last periods. Moving Average when the period is Even For an even order moving
average, two averaging processes are necessary in order to centre the moving
average against periods rather than between periods. For example , for a four –
yearly moving average we shall first obtain the average Y1=1/4(Y1 + Y2 + Y3 +
Y4 ) of the first four years and place it in between t =2 and t=3 then the average Y2
= 1/4( Y2 + Y3 + Y4 + Y5) of the next four years is and place it in between t=3
and t=4 , and finally obtain the average ½(Y1 + Y2) of the two averages and place
it against time t=3. Thus the moving average is brought against time or period
rather than between periods. The same procedure is repeated for further results.
This is tabulated below: T ime Observations Yt Moving Average for Centered
Value 9 (t) 4-period 1 Y1 2 Y2 → ¾(Y1 + Y2 + Y3 + Y4 ) = A1 3 Y3 → ½(A1 +
A2 ) → ¾(Y2 + Y3 + Y4 + Y5) = A2 4 Y4 It should be noted that when the period
of moving average is even, the computed average will correspond to the middle of
the two middle most periods. MERITS: 1. This method is easy to understand and
easy to use because there are no mathematical complexities involved. 2. It is an
objective method in the sense that anybody working on a problem with the method
will get the same trend values. It is in this respect better than the free hand curve
method. 3. It is a flexible method in the sense that if a few more observations are
added, the entire calculations are not changed. This not with the case of semi-
average method. 10 4. When the period of oscillatory movements is equal to the
period of moving average, these movements are completely eliminated. 5. By the
indirect use of this method, it is also possible to isolate seasonal, cyclical and
random components. DEMERITS: 1. It is not possible to calculate trend values for
all the items of the series. Some information is always lost at its ends. 2. This
method can determine accurate values of trend only if the oscillatory and the
random fluctuations are uniform in terms of period and amplitude and the trend is,
at least, approximately linear. However, these conditions are rarely met in practice.
When the trend is not linear, the moving averages will not give correct values of
the trend. 3. The trend values obtained by moving averages may not follow any
mathematical pattern i.e. fails in setting up a functional relationship between the
values of X(time) and Y(values) and thus, cannot be used for forecasting which
perhaps is the main task of any time series analysis. 4. The selection of period of
moving average is a difficult task and a great deal of care is needed to determine it.
5. Like arithmetic mean, the moving averages are too much affected by extreme
values. 11 5. METHOD OF LEAST SQUARES This is one of the most popular
methods of fitting a mathematical trend. The fitted trend is termed as the best in the
sense that the sum of squares of deviations of observations, from it, is minimized.
This method of Least squares may be used either to fit linear trend or a nonlinear
trend (Parabolic and Exponential trend). FITTIG OF LINEAR TREND Given the
data (Yt, t) for n periods, where t denotes time period such as year, month, day, etc.
We have to the values of the two constants, ‘a’ and ‘b’ of the linear trend equation:
Yt = a + bt Where the value of ‘a’ is merely the Y-intercept or the height of the line
above origin. That is, when X=0, Y= a. The other constant ‘b’ represents the slope
of the trend line. When b is positive, the slope is upwards, and when b is negative,
the slope is downward. This line is termed as the line of best fit because it is so
fitted that the total distance of deviations of the given data from the line is
minimum. The total of deviations is calculated by squaring the difference in trend
value and actual value of variable. Thus, the term “Least Squares” is attached to
this method. 12 using least square method, the normal equation for obtaining the
values of a and b are : ∑ Yt = na + b∑t ∑tYt =a∑t + b∑t 2 Let X = t – A, such that
∑X = 0, where A denotes the year of origin. The above equations can also be
written as ∑Y = na + b∑X ∑XY =a∑X + b ∑X2 Since ∑x = 0 i.e. deviation from
actual mean is zero We can write FITTING OF PARABOLIC TREND The
mathematical form of a parabolic trend is given by: Yt =a + bt+ct 2 Here a, b and c
are constants to be determined from the given data. 13 a =∑Y/n b =∑ XY /∑X 2
Using the method of least squares, the normal equations for the simultaneous
solution of a, b and c are: ∑ Y = na + b∑t +c∑t 2 ∑tY =a∑t + b∑t2 + c∑t3 ∑t 2Y
=a∑t2 + b∑t3 + c∑t4 By selecting a suitable year of origin, i.e. define X = t –
origin such that ∑X = 0, the computation work can be considerably simplified.
Also note that if ∑X = 0, then ∑X3 will also be equal to zero. Thus, the above
equations can be rewritten as: ∑ Y = na +c∑X 2 ……….(1) ∑XY = b∑X 2
……….(2) ∑X 2Y = a∑X 2 + c∑X 4……….(3) From equation (2), we get From
equation (1), we get 14 b =∑XY/∑X2 And from equation (3), we get This are the
three equations to find the value of constants a, b and c. FITTING OF
EXPONENTIAL TREND The general form of an exponential trend is: Y = a.b t
Where ‘a’ and ‘b’ are constants to be determined from the observed data. Taking
logarithms of both side, we gave log Y = log a + log b. This is linear equation in
log Y and t can be fitted in a similar way as done in case of linear trend. Let A=log
a and B =log b, then the above equation can be written as: 15 a = ∑ Y - c∑X2 /n c
= n∑X2Y – (∑X2 ) ( ∑ Y) /n∑X4 – (∑X2 ) 2 or c = ∑X2Y - a∑X2 /∑X4 log Y =
A + Bt The normal equations based on the principle of least squares are: ∑log Y =
n A + B ∑t And ∑t log Y = n ∑t + B ∑t 2 By selecting a suitable origin, i.e.
defining X = t – origin such that ∑X = 0, the computation work can be simplified.
The values of A and B are given by: And Thus, the fitted trend equation can be
written as: log Y = A + BX or Y = Antilog [A +BX] = Antilog [log a + X log b] =
Antilog [log a.b x ] = a.b x 16 A =∑log Y / n B =∑X log Y / ∑X2 MERITS: 1.
Given the mathematical form of the trend to be fitted, the least squares method is
an objective method. 2. Unlike the moving average method, it is possible to
compute trend values for all the periods and predict the value for a period lying
outside the observed data. 3. The results of the method of least squares are most
satisfactory because the fitted trend satisfies the two most important properties, i.e.
(1) ∑(Y0 - Yt ) = 0 and (2) ∑(Y0 - Yt )2 is minimum. Here Y0 denotes the
observed values and Yt denotes the calculated trend value. The first property
implies that the position of fitted trend equation is such that the sum of deviations
of observations above and below this equal to zero. The second property implies
that the sums of squares of deviations of observations, about the trend equations,
are minimum. DEMERITS: 1. As compared with the moving average method, it is
cumbersome method. 2. It is not flexible like the moving average method. If some
observations are added, then the entire calculations are to be done once again. 17 3.
It can predict or estimate values only in the immediate future or the past. 4. The
computation of trend values, on the basis of this method, doesn’t take into account
the other components of a time series and hence not reliable. 5. Since the choice of
a particular trend is arbitrary, the method is not, strictly, objective. 6. This method
cannot be used to fit growth curves, the pattern followed by the most of the
economic and business time series. MEASUREMENT OF SEASONAL
VARIATIONS The measurement of seasonal variations is done by isolating them
from other components of a time series. There are four methods commonly used
for the measurement of seasonal variations. These methods are: 1. Method of
Simple Average 2.Ratio to Trend Method 18 3.Ratio to Moving Average Method
4. Method of link Relatives 1. METHOD OF SIMPLE AVERAGE This is the
easiest and the simplest method of studying seasonal variations. This method is
used when the time series variable consists of only the seasonal and random
components. The effect of taking average of data corresponding to the same period
(say first quarter of each year) is to eliminate the effect of random component and
thus, the resulting averages consist of only seasonal component. These averages
are then converted into seasonal indices. It involves the following steps: If figures
are given on a monthly basis: 1. Average the raw data monthly year wise. 2. Find
the sum of all the figures relating to a month. It means add all values of January for
all the years. Repeat the process for all the months. 3. Find the average of monthly
figures i.e., divide the monthly total by the number of years. For example if the
data for 5 years (on monthly basis is available) there will be five figures for
January. These have to be totaled and divided by five to get the average figures for
January. Get such figures for all months. They may be X1, X2, X3........... X12. 4.
Obtain the average of monthly averages by dividing the sum of averages by 12 or
19 X1+X2+X3+...+X1212=X 5. Taking the average of monthly average as 100
find out the percentages of monthly averages. For the average of January (X1) this
percentage would be: average of januaryaverage of monthly averages×100 Or
X1X×100 If instead of the averages the monthly totals are taken into the account
the result would be the same. MERITS AND DEMERITS This is a simplest
method of measuring seasonal variations. However this method is based on the
unrealistic assumption that the trend and cyclical variations are absent from the
data. 2. RATIO TO TREND METHOD This method is used when then cyclical
variations are absent from the data, i.e. the time series variable Y consists of trend,
seasonal and random components. Using symbols, we can write Y = T. S .R
Various steps in the computation of seasonal indices are: 1. Obtain the trend values
for each month or quarter, etc, by the method of least squares. 2. Divide the
original values by the corresponding trend values. This would eliminate trend
values from the data. 20 3. To get figures in percentages, the quotients are
multiplied by 100. Thus, we have three equations: MERITS AND DEMERITS It is
an objective method of measuring seasonal variations. However, it is very
complicated and doesn’t work if cyclical variations are present. 3.RATIO TO
MOVING AVERAGE METHOD The ratio to moving average is the most
commonly used method of measuring seasonal variations. This method assumes
the presence of all the four components of a time series. Various steps in the
computation of seasonal indices are as follows: 1.Compute the moving averages
with period equal to the period of seasonal variations. This would eliminate the
seasonal components and minimize the effect of random component. The resulting
moving averages would consist of trend, cyclical and random components. 21 Y/ T
×100 T. S. R / T ×100 S. R ×100 2. The original values, for each quarter ( or
month) are divided by the respective moving average figures and the ratio is
expressed as a percentage, i.e. SR” = Y / M. A = TCSR / TCR’, where R’ and R”
denote the changed random components. 3. Finally, the random component R” is
eliminated by the method of simple averages. MERITS AND DEMERITS This
method assumes that all the four components of a time series are present and,
therefore, widely used for measuring seasonal variations. However, the seasonal
variations are not completely eliminated if the cycles of these variations are not of
regular nature. Further, some information is always lost at ends of the time series.
4. LINK RELATIVES METHOD This method is based on the assumption that the
trend is linear and cyclical variations are of uniform pattern. The link relatives are
percentages of the current period (quarter or month) as compared with the previous
period. With the computations of the link relatives and their average, the effect of
cyclical and the random components is minimized. Further, the trend gets
eliminated in the process of adjustment of chain relatives. 22 The following steps
are involved in the computation of seasonal indices by this method: 1. Compute the
Link Relative (L.R.) of each period by dividing the figure of that period with the
figure of previous period. For example, Link relative of 3rd quarter=figure of 3rd
quarter / figure of 2nd quarter ×100. 2. Obtain the average of link relatives of a
given quarter (or month) of various years. A.M. or Md can be used for this
purpose. Theoretically, the later is preferable because the former gives undue
importance to extreme items. 3. These averages are converted into chained
relatives by assuming the chained relative of the first quarter (or month) equal to
100. The chained relative (C.R.) for the current period (quarter or month) = C.R. of
the previous period ×L.R. of the current period / 100. 4.Compute the C.R. of the
first quarter (or month) on the basis of the last quarter (or month). This is given by
C.R. of the last quarter (month) × average L.R. of the first quarter (month) / 100 a.
This value, in general, is different from 100 due to long term trend in the data. The
chained relatives, obtained above, are to be adjusted for the effect of this trend. The
adjustment factor 23 d=14new C.R for 1st quater-100for quaterly data d=112new
C.R for 1st month-100for monthly data b. On the assumption that the trend is
linear d, 2d, 3d, etc, is respectively subtracted from the 2nd , 3 rd , 4th , etc quarter
(or month). 5. Express the adjusted chained relatives as a percentage of their
average to obtain seasonal indices. 6. Make sure that the sum of these indices is
400 for quarterly data and 1200 for monthly data. MERITS AND DEMERITS This
method is less complicated than the ratio to moving average and the ratio to trend
methods. However, this method is based upon the assumption of a linear trend
which may not always hold true. MEASUREMENT OF CYCLICAL
VARIATIONS A satisfactory method for the direct measurement of cyclical
variations is not available. The main reason for this is that although these
variations may be recurrent, these are seldom found to be of similar pattern having
same period and amplitude of oscillations. Moreover, in most of the cases these
variations are so intermixed with 24 random variations that it is very difficult, if
not impossible, to separate them. The cyclical variations are often obtained,
indirectly, as a residue after the elimination of other components. Various steps of
the method are given as below: 1.Compute the trend values (T) and the seasonal
indices(S) by appropriate methods. Here S is obtained as a fraction rather than the
percentage. 2. Divide Y-values by the product of trend and seasonal index. This
ratio would consist of cyclical and random component, i.e. C. R = Y / T. S 3. If
there are no random variations in the time series, the cyclical variations are given
by the step (2) above. Otherwise the random variations should be smoothened out
by computing moving averages of C. R. values with appropriate period. Weighted
moving average with suitable weights may also be used, if necessary, for this
purpose. MEASUREMENT OF RANDOM VARIATIONS The random variations
are also known as irregular variations. Because of their nature, it is very difficult to
devise a formula for their direct computation. Like the cyclical variations, this
component can also be obtained as a residue after eliminating the effects of other
components.

A time series is a sequence of observations recorded over a certain period of


time. A simple example of time series is how we come across different
temperature changes day by day or in a month. The tutorial will give you a
complete sort of understanding of what is time-series data, what methods are
used to forecast time series, and what makes time series data so special a
complex topic in the field of data science.
Table of Contents

• Basics of Time-series Forecasting


• Rolling statistics and stationarity in Time series
• Additive and Multiplicative Time-series
• Exponential Smoothing in Time Series
• Practicals with Time-Series data
o Exponential Smoothing Practicals
o Time series decomposition and stationarity check
• End Notes

Basics of Time-Series Forecasting


Timeseries forecasting in simple words means to forecast or to predict the
future value(eg-stock price) over a period of time. There are different
approaches to predict the value, consider an example there is a company XYZ
records the website traffic in each hour and now wants to forecast the total
traffic of the coming hour. If I ask you what will your approach to forecasting
the upcoming hour traffic?

A different person can have a different perspective like one can say find the
mean of all observations, one can have like take mean of recent two
observations, one can say like give more weightage to current observation and
less to past, or one can say use interpolation. There are different methods to
forecast the values.

while Forecasting time series values, 3 important terms need to be taken care
of and the main task of time series forecasting is to forecast these three terms.
DataHour: Leveraging VertexAI, Langchain, and Streamlit

: Thursday, 28 Sep 2023 : 7 PM – 8 PM IST

RSVP

1) Seasonality
Seasonality is a simple term that means while predicting a time series data there
are some months in a particular domain where the output value is at a peak as
compared to other months. for example if you observe the data of tours and
travels companies of past 3 years then you can see that in November and
December the distribution will be very high due to holiday season and festival
season. So while forecasting time series data we need to capture this
seasonality.

2) Trend
The trend is also one of the important factors which describe that there is
certainly increasing or decreasing trend time series, which actually means the
value of organization or sales over a period of time and seasonality is increasing
or decreasing.

3) Unexpected Events
Unexpected events mean some dynamic changes occur in an organization, or in
the market which cannot be captured. for example a current pandemic we are
suffering from, and if you observe the Sensex or nifty chart there is a huge
decrease in stock price which is an unexpected event that occurs in the
surrounding.
Methods and algorithms are using which we can capture seasonality and trend
But the unexpected event occurs dynamically so capturing this becomes very
difficult.

Rolling Statistics and Stationarity in Time-series


A stationary time series is a data that has a constant mean and constant variance.
If I take a mean of T1 and T2 and compare it with the mean of T4 and T5 then
is it the same, and if different, how much difference is there? So, constant mean
means this difference should be less, and the same with variance.

If the time series is not stationary, we have to make it stationary and then
proceed with modelling. Rolling statistics is help us in making time series
stationary. so basically rolling statistics calculates moving average. To calculate
the moving average we need to define the window size which is basically how
much past values to be considered.

For example, if we take the window as 2 then to calculate a moving average in


the above example then, at point T1 it will be blank, at point T2 it will be the
mean of T1 and T2, at point T3 mean of T3 and T2, and so on. And after
calculating all moving averages if you plot the line above actual values and
calculated moving averages then you can see that the plot will be smooth.

This is one method of making time series stationary, there are other methods
also which we are going to study as Exponential smoothing.

Additive and Multiplicative Time series


In the real world, we meet with different kinds of time series data. For this, we
must know the concepts of Exponential smoothing and for this first, we need to
study types of time series data as additive and multiplicative. As we studied
there are 3 components we need to capture as Trend(T), seasonality(S), and
Irregularity(I).

Additive time series is a combination(addition) of trend, seasonality, and


Irregularity while multiplicative time series is the multiplication of these three
terms.

Time series Exponential Smoothing


Exponential smoothing calculates the moving average by considering more past
values and give them weightage as per their occurrence, as recent observation
gets more weightage compared to past observation so that the prediction is
accurate. hence the formula of exponential smoothing can be defined as.

yT = α * XT + α(1−α) * yT−1

Alpha is a hyperparameter that defines the weightage to give. This is known


as simple exponential smoothing, But we need to capture trend and seasonality
components so there is double exponential smoothing which is used to capture
the trend components. only a little bit of modification in the above equation is
there.

Yt = α * Xt + (1-α) (yt-1 + bt-1) #trend component

where, bt = beta * (Yt – Yt-1) + (1-beta) * bt-1

hence here we are taking 2 past observations and what was in the previous
cycle, which means we are taking two consecutive sequences, so this equation
will give us the trend factor.

If we need to capture trend and seasonality for both components then it is


known as triple exponential smoothing which adds another layer on top of
trend exponential smoothing where we need to calculate trend and seasonality
for both.

Y = alpha * (Xt / Ct-1) + (1 – alpha)*(Y t-1 + bt-1)

where, ct = gamma * (xt/yt) + (1-alpha) * ct-alpha

here we are capturing trends as well as seasonality. Using smoothing we will be


able to decompose our time series data and our time-series data will become
easy to work with because in real-world scenarios working with time series is a
complex task so you have to adopt such methods to make the process smooth.

Practicals with Time series forecasting


It’s time to make our hands dirty by implementing the concepts we have learned
so far till now from start. we will implement Moving average, exponential
smoothing methods and compare them with an original distribution of data.

Exponential smoothing practicals


The dataset we are using is electricity consumption time series data and you can
easily find it on Kaggle from here.

step-1) Load the data first


Python Code:

Step-2) Moving Average method


we have seen how to calculate moving average using a window, same applies to
our dataset and we will get rolling statistics and find its mean. after the mean, if
we plot the graph then you can see the difference in smoothing of a graph as
the original.
rollingseries = df[1:50].rolling(window=5)
rollingmean = rollingseries.mean() #we can compute any statistical
measure
#print(rollingmean.head(10))
rollingmean.plot(color="red")
plt.show()

Step-3) Simple Exponential Smoothing


Now as we have seen in simple exponential smoothing has a parameter known
as alpha which defines how much weightage we want to give to recent
observation. we will fit 2 models, one with high value and one with less value of
alpha, and compare both.

data = df[1:50]
fit1 = SimpleExpSmoothing(data).fit(smoothing_level=0.2,
optimized=False)
fit2 = SimpleExpSmoothing(data).fit(smoothing_level=0.8,
optimized=False)
plt.figure(figsize=(18, 8))
plt.plot(df[1:50], marker='o', color="black")
plt.plot(fit1.fittedvalues, marker="o", color="b")
plt.plot(fit2.fittedvalues, marker="o", color="r")
plt.xticks(rotation="vertical")
plt.show()
Step-4) Holt method for exponential smoothing
Hot’s method is a popular method for exponential smoothing and is also known
as Linear exponential smoothing. It forecast the data with the trend. It works on
three separate equations that work together to generate the final forecast. let
us apply this to our data and experience the changes. In the first fit, we are
assuming that there is a linear trend in data, and in the second fitting, we are
having exponential smoothing.

fit1 = Holt(data).fit() #linear trend


fit2 = Holt(data, exponential=True).fit() #exponential trend
plt.plot(data, marker='o', color='black')
plt.plot(fit1.fittedvalues, marker='o', color='b')
plt.plot(fit2.fittedvalues, marker='o', color='r')
plt.xticks(rotation="vertical")
plt.show()
You can observe that linear trend means blue plot does not fit fine, and following
the original plot whereas red plot is an exponential smoothing plot. This is a
simple smoothing with the holt method, we also add parameters like alpha, trend
component, seasonality component.

Decomposition and stationarity check practicals


Now we will work and check which type of time series data we have, whether it
is additive or multiplicative. We will use a different dataset from above and it is
known as drug sales data which you can download from here.

Step-1) Load dataset


If you observe the above plot then we can see the upward trend in the data, but
we cannot see any kind of special seasonality.

from statsmodels.tsa.seasonal import seasonal_decompose


from dateutil.parser import parse
import pandas as pd
DrugSalesData = pd.read_csv('TimeSeries.csv', parse_dates=['Date'],
index_col='Date')
DrugSalesData.reset_index(inplace=True)
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.figsize': (10,6)})
plt.plot(DrugSalesData['Value'])
Step-2) Decomposition of time-series data
Now we will decompose time series data into multiplicative and additive and
visualize the seasonal and trend components that they have extracted.

# Additive Decomposition
add_result = seasonal_decompose(DrugSalesData['Value'],
model='additive',period=1)
# Multiplicative Decomposition
mul_result = seasonal_decompose(DrugSalesData['Value'],
model='multiplicative',period=1)

We imported the seasonal decompose function from the stats model and pass
both the model as multiplicative and additive. Now let us visualize the result of
each model one by one. first plot the results of the Additive time series.

add_result.plot().suptitle('nAdditive Decompose', fontsize=12)


plt.show()
If you observe the plots you will get 4 plots, two for trend, one for seasonality,
and one for residual. We can see that trend is of course there using both time
methods and seasonality is zero.
Now we also want to see the actual value of trend and seasonality, how much it
has been calculated. so we will prepare the dataframe of four columns which
will have a value for each plot. let us make of additive, and you can try will
multiplicative in the same way.

new_df_add = pd.concat([add_result.seasonal, add_result.trend,


add_result.resid, add_result.observed], axis=1)
new_df_add.columns = ['seasoanilty', 'trend', 'residual',
'actual_values']
new_df_add.head()

Step-3) ADfuller test for stationary

Stationary is constantly mean and constant variance. Adfuller is a simple test


which tells that if the time series is stationary which is a kind of hypothesis
testing. The Null hypothesis is time series are non-stationary. If the p-value is
less than 5 percent then reject the NULL hypothesis else accept the NULL
hypothesis.

from statsmodels.tsa.stattools import adfuller


adfuller_result = adfuller(DrugSalesData.Value.values, autolag='AIC')
print(f'ADF Statistic: {adfuller_result[0]}')
print(f'p-value: {adfuller_result[1]}')
for key, value in adfuller_result[4].items():
print('Critial Values:')
print(f' {key}, {value}')

P-value is greater than 5 per cent, which means we cannot build a model on
Non-stationary data so we have to make the time series stationary. Now to
make time-series stationary there are different methods like autoregression
with ACF, PACF, etc which we will cover in the second part of this article.

End Notes

We have seen what is time-series data, what makes time-series analysis a special
and complex task in Machine learning. We also perform practicals on how to
start working with time series data and how to perform various analyses and
drive inferences from it. In the upcoming part, we will discuss various methods
to make time-series stationary and we will also discuss various time series
classical models like ARIMA, SARIMA, etc.

I hope it was easy to follow till the end, I know it’s a little complex to handle
time-series data But after having a look through this article you got some sort
of understanding and confidence that you can handle time-series data. If you
have any queries, please post them in the comment section below.

)https://www.analyticsvidhya.com/blog/2021/07/time-series-forecasting-complete-
tutorial-part-1/ *

This tutorial is an introduction to time series forecasting using TensorFlow. It builds a


few different styles of models including Convolutional and Recurrent Neural Networks
(CNNs and RNNs).

This is covered in two main parts, with subsections:

• Forecast for a single time step:

• A single feature.
• All features.
• Forecast multiple steps:

• Single-shot: Make the predictions all at once.

• Autoregressive: Make one prediction at a time and feed the output back to the
model.

Setup
import os
import datetime

import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf

mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False

The weather dataset

This tutorial uses a weather time series dataset recorded by the Max Planck Institute for
Biogeochemistry.

This dataset contains 14 different features such as air temperature, atmospheric


pressure, and humidity. These were collected every 10 minutes, beginning in 2003. For
efficiency, you will use only the data collected between 2009 and 2016. This section of
the dataset was prepared by François Chollet for his book Deep Learning with Python.

zip_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-
datasets/jena_climate_2009_2016.csv.zip',
fname='jena_climate_2009_2016.csv.zip',
extract=True)
csv_path, _ = os.path.splitext(zip_path)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-


datasets/jena_climate_2009_2016.csv.zip
13568290/13568290 [==============================] - 0s 0us/step
This tutorial will just deal with hourly predictions, so start by sub-sampling the data
from 10-minute intervals to one-hour intervals:

df = pd.read_csv(csv_path)
# Slice [start:stop:step], starting from index 5 take every 6th record.
df = df[5::6]

date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y


%H:%M:%S')

Let's take a glance at the data. Here are the first few rows:

df.head()

Here is the evolution of a few features over time:

plot_cols = ['T (degC)', 'p (mbar)', 'rho (g/m**3)']


plot_features = df[plot_cols]
plot_features.index = date_time
_ = plot_features.plot(subplots=True)

plot_features = df[plot_cols][:480]
plot_features.index = date_time[:480]
_ = plot_features.plot(subplots=True)
Inspect and cleanup

Next, look at the statistics of the dataset:

df.describe().transpose()

Wind velocity

One thing that should stand out is the min value of the wind velocity (wv (m/s)) and the
maximum value (max. wv (m/s)) columns. This -9999 is likely erroneous.

There's a separate wind direction column, so the velocity should be greater than zero
(>=0). Replace it with zeros:

wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0

max_wv = df['max. wv (m/s)']


bad_max_wv = max_wv == -9999.0
max_wv[bad_max_wv] = 0.0

# The above inplace edits are reflected in the DataFrame.


df['wv (m/s)'].min()

0.0
Feature engineering

Before diving in to build a model, it's important to understand your data and be sure that
you're passing the model appropriately formatted data.

Wind

The last column of the data, wd (deg)—gives the wind direction in units of degrees.
Angles do not make good model inputs: 360° and 0° should be close to each other and
wrap around smoothly. Direction shouldn't matter if the wind is not blowing.

Right now the distribution of wind data looks like this:

plt.hist2d(df['wd (deg)'], df['wv (m/s)'], bins=(50, 50), vmax=400)


plt.colorbar()
plt.xlabel('Wind Direction [deg]')
plt.ylabel('Wind Velocity [m/s]')

Text(0, 0.5, 'Wind Velocity [m/s]')


But this will be easier for the model to interpret if you convert the wind direction and
velocity columns to a wind vector:

wv = df.pop('wv (m/s)')
max_wv = df.pop('max. wv (m/s)')

# Convert to radians.
wd_rad = df.pop('wd (deg)')*np.pi / 180

# Calculate the wind x and y components.


df['Wx'] = wv*np.cos(wd_rad)
df['Wy'] = wv*np.sin(wd_rad)

# Calculate the max wind x and y components.


df['max Wx'] = max_wv*np.cos(wd_rad)
df['max Wy'] = max_wv*np.sin(wd_rad)
The distribution of wind vectors is much simpler for the model to correctly interpret:

plt.hist2d(df['Wx'], df['Wy'], bins=(50, 50), vmax=400)


plt.colorbar()
plt.xlabel('Wind X [m/s]')
plt.ylabel('Wind Y [m/s]')
ax = plt.gca()
ax.axis('tight')

(-11.305513973134667, 8.24469928549079, -8.27438540335515, 7.7338312955467785)

Time

Similarly, the Date Time column is very useful, but not in this string form. Start by
converting it to seconds:
timestamp_s = date_time.map(pd.Timestamp.timestamp)

Similar to the wind direction, the time in seconds is not a useful model input.
Being weather data, it has clear daily and yearly periodicity. There are many ways
you could deal with periodicity.

You can get usable signals by using sine and cosine transforms to clear "Time of
day" and "Time of year" signals:

day = 24*60*60
year = (365.2425)*day

df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))


df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))

plt.plot(np.array(df['Day sin'])[:25])
plt.plot(np.array(df['Day cos'])[:25])
plt.xlabel('Time [h]')
plt.title('Time of day signal')

Text(0.5, 1.0, 'Time of day signal')


This gives the model access to the most important frequency features. In this case
you knew ahead of time which frequencies were important.

If you don't have that information, you can determine which frequencies are
important by extracting features with Fast Fourier Transform. To check the
assumptions, here is the tf.signal.rfft of the temperature over time. Note the
obvious peaks at frequencies near 1/year and 1/day:

fft = tf.signal.rfft(df['T (degC)'])


f_per_dataset = np.arange(0, len(fft))

n_samples_h = len(df['T (degC)'])


hours_per_year = 24*365.2524
years_per_dataset = n_samples_h/(hours_per_year)

f_per_year = f_per_dataset/years_per_dataset
plt.step(f_per_year, np.abs(fft))
plt.xscale('log')
plt.ylim(0, 400000)
plt.xlim([0.1, max(plt.xlim())])
plt.xticks([1, 365.2524], labels=['1/Year', '1/day'])
_ = plt.xlabel('Frequency (log scale)')

Split the data

You'll use a (70%, 20%, 10%) split for the training, validation, and test sets. Note
the data is not being randomly shuffled before splitting. This is for two reasons:

1. It ensures that chopping the data into windows of consecutive samples is still
possible.
2. It ensures that the validation/test results are more realistic, being evaluated
on the data collected after the model was trained.
column_indices = {name: i for i, name in enumerate(df.columns)}

n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]

num_features = df.shape[1]

Normalize the data

It is important to scale features before training a neural network. Normalization is a


common way of doing this scaling: subtract the mean and divide by the standard
deviation of each feature.

The mean and standard deviation should only be computed using the training data
so that the models have no access to the values in the validation and test sets.

It's also arguable that the model shouldn't have access to future values in the
training set when training, and that this normalization should be done using
moving averages. That's not the focus of this tutorial, and the validation and test
sets ensure that you get (somewhat) honest metrics. So, in the interest of simplicity
this tutorial uses a simple average.

train_mean = train_df.mean()
train_std = train_df.std()

train_df = (train_df - train_mean) / train_std


val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std

Now, peek at the distribution of the features. Some features do have long tails, but
there are no obvious errors like the -9999 wind velocity value.

df_std = (df - train_mean) / train_std


df_std = df_std.melt(var_name='Column', value_name='Normalized')
plt.figure(figsize=(12, 6))
ax = sns.violinplot(x='Column', y='Normalized', data=df_std)
_ = ax.set_xticklabels(df.keys(), rotation=90)
Data windowing

The models in this tutorial will make a set of predictions based on a window of
consecutive samples from the data.

The main features of the input windows are:

• The width (number of time steps) of the input and label windows.
• The time offset between them.
• Which features are used as inputs, labels, or both.

This tutorial builds a variety of models (including Linear, DNN, CNN and RNN
models), and uses them for both:

• Single-output, and multi-output predictions.


• Single-time-step and multi-time-step predictions.

This section focuses on implementing the data windowing so that it can be reused
for all of those models.
Depending on the task and type of model you may want to generate a variety of
data windows. Here are some examples:

1. For example, to make a single prediction 24 hours into the future, given 24
hours of history, you might define a window like this:

2. A model that makes a prediction one hour into the future, given six hours of
history, would need a window like this:

The rest of this section defines a WindowGenerator class. This class can:

1. Handle the indexes and offsets as shown in the diagrams above.


2. Split windows of features into (features, labels) pairs.
3. Plot the content of the resulting windows.
4. Efficiently generate batches of these windows from the training, evaluation,
and test data, using tf.data.Datasets.
1. Indexes and offsets

Start by creating the WindowGenerator class. The __init__ method includes all the
necessary logic for the input and label indices.

It also takes the training, evaluation, and test DataFrames as input. These will be
converted to tf.data.Datasets of windows later.

class WindowGenerator():
def __init__(self, input_width, label_width, shift,
train_df=train_df, val_df=val_df, test_df=test_df,
label_columns=None):
# Store the raw data.
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df

# Work out the label column indices.


self.label_columns = label_columns
if label_columns is not None:
self.label_columns_indices = {name: i for i, name in
enumerate(label_columns)}
self.column_indices = {name: i for i, name in
enumerate(train_df.columns)}

# Work out the window parameters.


self.input_width = input_width
self.label_width = label_width
self.shift = shift

self.total_window_size = input_width + shift

self.input_slice = slice(0, input_width)


self.input_indices = np.arange(self.total_window_size)[self.input_slice]

self.label_start = self.total_window_size - self.label_width


self.labels_slice = slice(self.label_start, None)
self.label_indices = np.arange(self.total_window_size)[self.labels_slice]
def __repr__(self):
return '\n'.join([
f'Total window size: {self.total_window_size}',
f'Input indices: {self.input_indices}',
f'Label indices: {self.label_indices}',
f'Label column name(s): {self.label_columns}'])

Here is code to create the 2 windows shown in the diagrams at the start of this
section:

w1 = WindowGenerator(input_width=24, label_width=1, shift=24,


label_columns=['T (degC)'])
w1

Total window size: 48


Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23]
Label indices: [47]
Label column name(s): ['T (degC)']
w2 = WindowGenerator(input_width=6, label_width=1, shift=1,
label_columns=['T (degC)'])
w2

Total window size: 7


Input indices: [0 1 2 3 4 5]
Label indices: [6]
Label column name(s): ['T (degC)']
2. Split

Given a list of consecutive inputs, the split_window method will convert them to a
window of inputs and a window of labels.

The example w2 you define earlier will be split like this:


This diagram doesn't show the features axis of the data, but
this split_window function also handles the label_columns so it can be used for
both the single output and multi-output examples.

def split_window(self, features):


inputs = features[:, self.input_slice, :]
labels = features[:, self.labels_slice, :]
if self.label_columns is not None:
labels = tf.stack(
[labels[:, :, self.column_indices[name]] for name in self.label_columns],
axis=-1)

# Slicing doesn't preserve static shape information, so set the shapes


# manually. This way the `tf.data.Datasets` are easier to inspect.
inputs.set_shape([None, self.input_width, None])
labels.set_shape([None, self.label_width, None])

return inputs, labels

WindowGenerator.split_window = split_window

Try it out:
# Stack three slices, the length of the total window.
example_window = tf.stack([np.array(train_df[:w2.total_window_size]),
np.array(train_df[100:100+w2.total_window_size]),
np.array(train_df[200:200+w2.total_window_size])])

example_inputs, example_labels = w2.split_window(example_window)

print('All shapes are: (batch, time, features)')


print(f'Window shape: {example_window.shape}')
print(f'Inputs shape: {example_inputs.shape}')
print(f'Labels shape: {example_labels.shape}')

All shapes are: (batch, time, features)


Window shape: (3, 7, 19)
Inputs shape: (3, 6, 19)
Labels shape: (3, 1, 1)

Typically, data in TensorFlow is packed into arrays where the outermost index is
across examples (the "batch" dimension). The middle indices are the "time" or
"space" (width, height) dimension(s). The innermost indices are the features.

The code above took a batch of three 7-time step windows with 19 features at each
time step. It splits them into a batch of 6-time step 19-feature inputs, and a 1-time
step 1-feature label. The label only has one feature because
the WindowGenerator was initialized with label_columns=['T (degC)']. Initially,
this tutorial will build models that predict single output labels.

3. Plot

Here is a plot method that allows a simple visualization of the split window:

w2.example = example_inputs, example_labels

def plot(self, model=None, plot_col='T (degC)', max_subplots=3):


inputs, labels = self.example
plt.figure(figsize=(12, 8))
plot_col_index = self.column_indices[plot_col]
max_n = min(max_subplots, len(inputs))
for n in range(max_n):
plt.subplot(max_n, 1, n+1)
plt.ylabel(f'{plot_col} [normed]')
plt.plot(self.input_indices, inputs[n, :, plot_col_index],
label='Inputs', marker='.', zorder=-10)

if self.label_columns:
label_col_index = self.label_columns_indices.get(plot_col, None)
else:
label_col_index = plot_col_index

if label_col_index is None:
continue

plt.scatter(self.label_indices, labels[n, :, label_col_index],


edgecolors='k', label='Labels', c='#2ca02c', s=64)
if model is not None:
predictions = model(inputs)
plt.scatter(self.label_indices, predictions[n, :, label_col_index],
marker='X', edgecolors='k', label='Predictions',
c='#ff7f0e', s=64)

if n == 0:
plt.legend()

plt.xlabel('Time [h]')

WindowGenerator.plot = plot

This plot aligns inputs, labels, and (later) predictions based on the time that the
item refers to:

w2.plot()
You can plot the other columns, but the example window w2 configuration only
has labels for the T (degC) column.

w2.plot(plot_col='p (mbar)')
4. Create tf.data.Datasets

Finally, this make_dataset method will take a time series DataFrame and convert it
to a tf.data.Dataset of (input_window, label_window) pairs using
the tf.keras.utils.timeseries_dataset_from_array function:

def make_dataset(self, data):


data = np.array(data, dtype=np.float32)
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,)

ds = ds.map(self.split_window)

return ds
WindowGenerator.make_dataset = make_dataset

The WindowGenerator object holds training, validation, and test data.

Add properties for accessing them as tf.data.Datasets using


the make_dataset method you defined earlier. Also, add a standard example batch
for easy access and plotting:

@property
def train(self):
return self.make_dataset(self.train_df)

@property
def val(self):
return self.make_dataset(self.val_df)

@property
def test(self):
return self.make_dataset(self.test_df)

@property
def example(self):
"""Get and cache an example batch of `inputs, labels` for plotting."""
result = getattr(self, '_example', None)
if result is None:
# No example batch was found, so get one from the `.train` dataset
result = next(iter(self.train))
# And cache it for next time
self._example = result
return result

WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example

Now, the WindowGenerator object gives you access to the tf.data.Dataset objects,
so you can easily iterate over the data.
The Dataset.element_spec property tells you the structure, data types, and shapes
of the dataset elements.

# Each element is an (inputs, label) pair.


w2.train.element_spec

(TensorSpec(shape=(None, 6, 19), dtype=tf.float32, name=None),


TensorSpec(shape=(None, 1, 1), dtype=tf.float32, name=None))

Iterating over a Dataset yields concrete batches:

for example_inputs, example_labels in w2.train.take(1):


print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
print(f'Labels shape (batch, time, features): {example_labels.shape}')

Inputs shape (batch, time, features): (32, 6, 19)


Labels shape (batch, time, features): (32, 1, 1)
Single step models

The simplest model you can build on this sort of data is one that predicts a single
feature's value—1 time step (one hour) into the future based only on the current
conditions.

So, start by building models to predict the T (degC) value one hour into the future.
Configure a WindowGenerator object to produce these single-step (input,
label) pairs:

single_step_window = WindowGenerator(
input_width=1, label_width=1, shift=1,
label_columns=['T (degC)'])
single_step_window

Total window size: 2


Input indices: [0]
Label indices: [1]
Label column name(s): ['T (degC)']

The window object creates tf.data.Datasets from the training, validation, and test
sets, allowing you to easily iterate over batches of data.

for example_inputs, example_labels in single_step_window.train.take(1):


print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
print(f'Labels shape (batch, time, features): {example_labels.shape}')

Inputs shape (batch, time, features): (32, 1, 19)


Labels shape (batch, time, features): (32, 1, 1)
Baseline

Before building a trainable model it would be good to have a performance baseline


as a point for comparison with the later more complicated models.

This first task is to predict temperature one hour into the future, given the current
value of all features. The current values include the current temperature.

So, start with a model that just returns the current temperature as the prediction,
predicting "No change". This is a reasonable baseline since temperature changes
slowly. Of course, this baseline will work less well if you make a prediction further
in the future.

class Baseline(tf.keras.Model):
def __init__(self, label_index=None):
super().__init__()
self.label_index = label_index

def call(self, inputs):


if self.label_index is None:
return inputs
result = inputs[:, :, self.label_index]
return result[:, :, tf.newaxis]

Instantiate and evaluate this model:


baseline = Baseline(label_index=column_indices['T (degC)'])

baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

val_performance = {}
performance = {}
val_performance['Baseline'] = baseline.evaluate(single_step_window.val)
performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)

439/439 [==============================] - 1s 2ms/step - loss: 0.0128 -


mean_absolute_error: 0.0785

That printed some performance metrics, but those don't give you a feeling for how
well the model is doing.

The WindowGenerator has a plot method, but the plots won't be very interesting
with only a single sample.

So, create a wider WindowGenerator that generates windows 24 hours of


consecutive inputs and labels at a time. The new wide_window variable doesn't
change the way the model operates. The model still makes predictions one hour
into the future based on a single input time step. Here, the time axis acts like
the batch axis: each prediction is made independently with no interaction between
time steps:

wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1,
label_columns=['T (degC)'])

wide_window

Total window size: 25


Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23]
Label indices: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24]
Label column name(s): ['T (degC)']
This expanded window can be passed directly to the same baseline model without
any code changes. This is possible because the inputs and labels have the same
number of time steps, and the baseline just forwards the input to the output:

print('Input shape:', wide_window.example[0].shape)


print('Output shape:', baseline(wide_window.example[0]).shape)

Input shape: (32, 24, 19)


Output shape: (32, 24, 1)

By plotting the baseline model's predictions, notice that it is simply the labels
shifted right by one hour:

wide_window.plot(baseline)
In the above plots of three examples the single step model is run over the course of
24 hours. This deserves some explanation:

• The blue Inputs line shows the input temperature at each time step. The
model receives all features, this plot only shows the temperature.
• The green Labels dots show the target prediction value. These dots are
shown at the prediction time, not the input time. That is why the range of
labels is shifted 1 step relative to the inputs.
• The orange Predictions crosses are the model's prediction's for each output
time step. If the model were predicting perfectly the predictions would land
directly on the Labels.
Linear model

The simplest trainable model you can apply to this task is to insert linear
transformation between the input and output. In this case the output from a time
step only depends on that step:
A tf.keras.layers.Dense layer with no activation set is a linear model. The layer
only transforms the last axis of the data from (batch, time, inputs) to (batch, time,
units); it is applied independently to every item across the batch and time axes.

linear = tf.keras.Sequential([
tf.keras.layers.Dense(units=1)
])

print('Input shape:', single_step_window.example[0].shape)


print('Output shape:', linear(single_step_window.example[0]).shape)

Input shape: (32, 1, 19)


Output shape: (32, 1, 1)

This tutorial trains many models, so package the training procedure into a function:

MAX_EPOCHS = 20

def compile_and_fit(model, window, patience=2):


early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
patience=patience,
mode='min')
model.compile(loss=tf.keras.losses.MeanSquaredError(),
optimizer=tf.keras.optimizers.Adam(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

history = model.fit(window.train, epochs=MAX_EPOCHS,


validation_data=window.val,
callbacks=[early_stopping])
return history

Train the model and evaluate its performance:

history = compile_and_fit(linear, single_step_window)

val_performance['Linear'] = linear.evaluate(single_step_window.val)
performance['Linear'] = linear.evaluate(single_step_window.test, verbose=0)

Epoch 1/20
1534/1534 [==============================] - 5s 3ms/step - loss: 0.2398
- mean_absolute_error: 0.2786 - val_loss: 0.0124 - val_mean_absolute_error:
0.0838
Epoch 2/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0111
- mean_absolute_error: 0.0786 - val_loss: 0.0102 - val_mean_absolute_error:
0.0757
Epoch 3/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0097
- mean_absolute_error: 0.0730 - val_loss: 0.0091 - val_mean_absolute_error:
0.0712
Epoch 4/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0092
- mean_absolute_error: 0.0705 - val_loss: 0.0088 - val_mean_absolute_error:
0.0695
Epoch 5/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091
- mean_absolute_error: 0.0699 - val_loss: 0.0089 - val_mean_absolute_error:
0.0701
Epoch 6/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091
- mean_absolute_error: 0.0698 - val_loss: 0.0088 - val_mean_absolute_error:
0.0696
Epoch 7/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091
- mean_absolute_error: 0.0697 - val_loss: 0.0088 - val_mean_absolute_error:
0.0694
Epoch 8/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0697 - val_loss: 0.0087 - val_mean_absolute_error:
0.0688
Epoch 9/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0697 - val_loss: 0.0087 - val_mean_absolute_error:
0.0696
Epoch 10/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0696 - val_loss: 0.0088 - val_mean_absolute_error:
0.0692
Epoch 11/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0696 - val_loss: 0.0087 - val_mean_absolute_error:
0.0691
Epoch 12/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0696 - val_loss: 0.0088 - val_mean_absolute_error:
0.0699
Epoch 13/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0695 - val_loss: 0.0088 - val_mean_absolute_error:
0.0697
439/439 [==============================] - 1s 2ms/step - loss: 0.0088 -
mean_absolute_error: 0.0697

Like the baseline model, the linear model can be called on batches of wide
windows. Used this way the model makes a set of independent predictions on
consecutive time steps. The time axis acts like another batch axis. There are no
interactions between the predictions at each time step.
print('Input shape:', wide_window.example[0].shape)
print('Output shape:', linear(wide_window.example[0]).shape)

Input shape: (32, 24, 19)


Output shape: (32, 24, 1)

Here is the plot of its example predictions on the wide_window, note how in many
cases the prediction is clearly better than just returning the input temperature, but
in a few cases it's worse:

wide_window.plot(linear)
One advantage to linear models is that they're relatively simple to interpret. You
can pull out the layer's weights and visualize the weight assigned to each input:

plt.bar(x = range(len(train_df.columns)),
height=linear.layers[0].kernel[:,0].numpy())
axis = plt.gca()
axis.set_xticks(range(len(train_df.columns)))
_ = axis.set_xticklabels(train_df.columns, rotation=90)
Sometimes the model doesn't even place the most weight on the input T (degC).
This is one of the risks of random initialization.

Dense

Before applying models that actually operate on multiple time-steps, it's worth
checking the performance of deeper, more powerful, single input step models.

Here's a model similar to the linear model, except it stacks several a


few Dense layers between the input and the output:

dense = tf.keras.Sequential([
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=1)
])

history = compile_and_fit(dense, single_step_window)

val_performance['Dense'] = dense.evaluate(single_step_window.val)
performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)

Epoch 1/20
1534/1534 [==============================] - 7s 4ms/step - loss: 0.0177
- mean_absolute_error: 0.0793 - val_loss: 0.0080 - val_mean_absolute_error:
0.0655
Epoch 2/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0079
- mean_absolute_error: 0.0648 - val_loss: 0.0072 - val_mean_absolute_error:
0.0608
Epoch 3/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0076
- mean_absolute_error: 0.0630 - val_loss: 0.0070 - val_mean_absolute_error:
0.0596
Epoch 4/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0073
- mean_absolute_error: 0.0611 - val_loss: 0.0065 - val_mean_absolute_error:
0.0566
Epoch 5/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0070
- mean_absolute_error: 0.0600 - val_loss: 0.0070 - val_mean_absolute_error:
0.0588
Epoch 6/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0069
- mean_absolute_error: 0.0589 - val_loss: 0.0075 - val_mean_absolute_error:
0.0636
439/439 [==============================] - 1s 2ms/step - loss: 0.0075 -
mean_absolute_error: 0.0636
Multi-step dense

A single-time-step model has no context for the current values of its inputs. It can't
see how the input features are changing over time. To address this issue the model
needs access to multiple time steps when making predictions:
The baseline, linear and dense models handled each time step independently. Here
the model will take multiple time steps as input to produce a single output.

Create a WindowGenerator that will produce batches of three-hour inputs and one-
hour labels:

Note that the Window's shift parameter is relative to the end of the two windows.

CONV_WIDTH = 3
conv_window = WindowGenerator(
input_width=CONV_WIDTH,
label_width=1,
shift=1,
label_columns=['T (degC)'])

conv_window

Total window size: 4


Input indices: [0 1 2]
Label indices: [3]
Label column name(s): ['T (degC)']
conv_window.plot()
plt.title("Given 3 hours of inputs, predict 1 hour into the future.")

Text(0.5, 1.0, 'Given 3 hours of inputs, predict 1 hour into the future.')

You could train a dense model on a multiple-input-step window by adding


a tf.keras.layers.Flatten as the first layer of the model:

multi_step_dense = tf.keras.Sequential([
# Shape: (time, features) => (time*features)
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=1),
# Add back the time dimension.
# Shape: (outputs) => (1, outputs)
tf.keras.layers.Reshape([1, -1]),
])
print('Input shape:', conv_window.example[0].shape)
print('Output shape:', multi_step_dense(conv_window.example[0]).shape)

Input shape: (32, 3, 19)


Output shape: (32, 1, 1)
history = compile_and_fit(multi_step_dense, conv_window)

IPython.display.clear_output()
val_performance['Multi step dense'] =
multi_step_dense.evaluate(conv_window.val)
performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.test,
verbose=0)

438/438 [==============================] - 1s 2ms/step - loss: 0.0072 -


mean_absolute_error: 0.0618
conv_window.plot(multi_step_dense)

The main down-side of this approach is that the resulting model can only be
executed on input windows of exactly this shape.
print('Input shape:', wide_window.example[0].shape)
try:
print('Output shape:', multi_step_dense(wide_window.example[0]).shape)
except Exception as e:
print(f'\n{type(e).__name__}:{e}')

Input shape: (32, 24, 19)

ValueError:Exception encountered when calling layer 'sequential_2' (type


Sequential).

Input 0 of layer "dense_4" is incompatible with the layer: expected axis -1 of input
shape to have value 57, but received input with shape (32, 456)

Call arguments received by layer 'sequential_2' (type Sequential):


• inputs=tf.Tensor(shape=(32, 24, 19), dtype=float32)
• training=None
• mask=None

The convolutional models in the next section fix this problem.

Convolution neural network

A convolution layer (tf.keras.layers.Conv1D) also takes multiple time steps as


input to each prediction.

Below is the same model as multi_step_dense, re-written with a convolution.

Note the changes:

• The tf.keras.layers.Flatten and the first tf.keras.layers.Dense are replaced by


a tf.keras.layers.Conv1D.
• The tf.keras.layers.Reshape is no longer necessary since the convolution
keeps the time axis in its output.
conv_model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=32,
kernel_size=(CONV_WIDTH,),
activation='relu'),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=1),
])

Run it on an example batch to check that the model produces outputs with the
expected shape:

print("Conv model on `conv_window`")


print('Input shape:', conv_window.example[0].shape)
print('Output shape:', conv_model(conv_window.example[0]).shape)

Conv model on `conv_window`


Input shape: (32, 3, 19)
Output shape: (32, 1, 1)

Train and evaluate it on the conv_window and it should give performance similar
to the multi_step_dense model.

history = compile_and_fit(conv_model, conv_window)

IPython.display.clear_output()
val_performance['Conv'] = conv_model.evaluate(conv_window.val)
performance['Conv'] = conv_model.evaluate(conv_window.test, verbose=0)

438/438 [==============================] - 1s 2ms/step - loss: 0.0061 -


mean_absolute_error: 0.0546

The difference between this conv_model and the multi_step_dense model is that
the conv_model can be run on inputs of any length. The convolutional layer is
applied to a sliding window of inputs:
If you run it on wider input, it produces wider output:

print("Wide window")
print('Input shape:', wide_window.example[0].shape)
print('Labels shape:', wide_window.example[1].shape)
print('Output shape:', conv_model(wide_window.example[0]).shape)

Wide window
Input shape: (32, 24, 19)
Labels shape: (32, 24, 1)
Output shape: (32, 22, 1)

Note that the output is shorter than the input. To make training or plotting work,
you need the labels, and prediction to have the same length. So build
a WindowGenerator to produce wide windows with a few extra input time steps so
the label and prediction lengths match:

LABEL_WIDTH = 24
INPUT_WIDTH = LABEL_WIDTH + (CONV_WIDTH - 1)
wide_conv_window = WindowGenerator(
input_width=INPUT_WIDTH,
label_width=LABEL_WIDTH,
shift=1,
label_columns=['T (degC)'])

wide_conv_window

Total window size: 27


Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25]
Label indices: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26]
Label column name(s): ['T (degC)']
print("Wide conv window")
print('Input shape:', wide_conv_window.example[0].shape)
print('Labels shape:', wide_conv_window.example[1].shape)
print('Output shape:', conv_model(wide_conv_window.example[0]).shape)

Wide conv window


Input shape: (32, 26, 19)
Labels shape: (32, 24, 1)
Output shape: (32, 24, 1)

Now, you can plot the model's predictions on a wider window. Note the 3 input
time steps before the first prediction. Every prediction here is based on the 3
preceding time steps:

wide_conv_window.plot(conv_model)
Recurrent neural network

A Recurrent Neural Network (RNN) is a type of neural network well-suited to time


series data. RNNs process a time series step-by-step, maintaining an internal state
from time-step to time-step.

You can learn more in the Text generation with an RNN tutorial and the Recurrent
Neural Networks (RNN) with Keras guide.

In this tutorial, you will use an RNN layer called Long Short-Term Memory
(tf.keras.layers.LSTM).

An important constructor argument for all Keras RNN layers, such


as tf.keras.layers.LSTM, is the return_sequences argument. This setting can
configure the layer in one of two ways:

1. If False, the default, the layer only returns the output of the final time step,
giving the model time to warm up its internal state before making a single
prediction:
1. If True, the layer returns an output for each input. This is useful for:
• Stacking RNN layers.
• Training a model on multiple time steps simultaneously.
lstm_model = tf.keras.models.Sequential([
# Shape [batch, time, features] => [batch, time, lstm_units]
tf.keras.layers.LSTM(32, return_sequences=True),
# Shape => [batch, time, features]
tf.keras.layers.Dense(units=1)
])

With return_sequences=True, the model can be trained on 24 hours of data at a


time.

Note: This will give a pessimistic view of the model's performance. On the first
time step, the model has no access to previous steps and, therefore, can't do any
better than the simple linear and dense models shown earlier.

print('Input shape:', wide_window.example[0].shape)


print('Output shape:', lstm_model(wide_window.example[0]).shape)

Input shape: (32, 24, 19)


Output shape: (32, 24, 1)
history = compile_and_fit(lstm_model, wide_window)

IPython.display.clear_output()
val_performance['LSTM'] = lstm_model.evaluate(wide_window.val)
performance['LSTM'] = lstm_model.evaluate(wide_window.test, verbose=0)

438/438 [==============================] - 1s 3ms/step - loss: 0.0056 -


mean_absolute_error: 0.0516
wide_window.plot(lstm_model)

Performance

With this dataset typically each of the models does slightly better than the one
before it:

x = np.arange(len(performance))
width = 0.3
metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in val_performance.values()]
test_mae = [v[metric_index] for v in performance.values()]

plt.ylabel('mean_absolute_error [T (degC), normalized]')


plt.bar(x - 0.17, val_mae, width, label='Validation')
plt.bar(x + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=performance.keys(),
rotation=45)
_ = plt.legend()

for name, value in performance.items():


print(f'{name:12s}: {value[1]:0.4f}')
Baseline : 0.0852
Linear : 0.0688
Dense : 0.0616
Multi step dense: 0.0663
Conv : 0.0549
LSTM : 0.0529
Multi-output models

The models so far all predicted a single output feature, T (degC), for a single time
step.

All of these models can be converted to predict multiple features just by changing
the number of units in the output layer and adjusting the training windows to
include all features in the labels (example_labels):

single_step_window = WindowGenerator(
# `WindowGenerator` returns all features as labels if you
# don't set the `label_columns` argument.
input_width=1, label_width=1, shift=1)

wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1)

for example_inputs, example_labels in wide_window.train.take(1):


print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
print(f'Labels shape (batch, time, features): {example_labels.shape}')

Inputs shape (batch, time, features): (32, 24, 19)


Labels shape (batch, time, features): (32, 24, 19)

Note above that the features axis of the labels now has the same depth as the
inputs, instead of 1.

Baseline

The same baseline model (Baseline) can be used here, but this time repeating all
features instead of selecting a specific label_index:

baseline = Baseline()
baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

val_performance = {}
performance = {}
val_performance['Baseline'] = baseline.evaluate(wide_window.val)
performance['Baseline'] = baseline.evaluate(wide_window.test, verbose=0)

438/438 [==============================] - 1s 2ms/step - loss: 0.0886 -


mean_absolute_error: 0.1589

Dense

dense = tf.keras.Sequential([
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=num_features)
])

history = compile_and_fit(dense, single_step_window)

IPython.display.clear_output()
val_performance['Dense'] = dense.evaluate(single_step_window.val)
performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)

439/439 [==============================] - 1s 2ms/step - loss: 0.0684 -


mean_absolute_error: 0.1314

RNN

%%time
wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1)

lstm_model = tf.keras.models.Sequential([
# Shape [batch, time, features] => [batch, time, lstm_units]
tf.keras.layers.LSTM(32, return_sequences=True),
# Shape => [batch, time, features]
tf.keras.layers.Dense(units=num_features)
])
history = compile_and_fit(lstm_model, wide_window)

IPython.display.clear_output()
val_performance['LSTM'] = lstm_model.evaluate( wide_window.val)
performance['LSTM'] = lstm_model.evaluate( wide_window.test, verbose=0)

print()

438/438 [==============================] - 1s 3ms/step - loss: 0.0616 -


mean_absolute_error: 0.1205

CPU times: user 3min 40s, sys: 42 s, total: 4min 22s


Wall time: 1min 38s

Advanced: Residual connections

The Baseline model from earlier took advantage of the fact that the sequence
doesn't change drastically from time step to time step. Every model trained in this
tutorial so far was randomly initialized, and then had to learn that the output is a a
small change from the previous time step.

While you can get around this issue with careful initialization, it's simpler to build
this into the model structure.

It's common in time series analysis to build models that instead of predicting the
next value, predict how the value will change in the next time step.
Similarly, residual networks—or ResNets—in deep learning refer to architectures
where each layer adds to the model's accumulating result.

That is how you take advantage of the knowledge that the change should be small.
Essentially, this initializes the model to match the Baseline. For this task it helps
models converge faster, with slightly better performance.

This approach can be used in conjunction with any model discussed in this tutorial.

Here, it is being applied to the LSTM model, note the use of


the tf.initializers.zeros to ensure that the initial predicted changes are small, and
don't overpower the residual connection. There are no symmetry-breaking
concerns for the gradients here, since the zeros are only used on the last layer.

class ResidualWrapper(tf.keras.Model):
def __init__(self, model):
super().__init__()
self.model = model

def call(self, inputs, *args, **kwargs):


delta = self.model(inputs, *args, **kwargs)

# The prediction for each time step is the input


# from the previous time step plus the delta
# calculated by the model.
return inputs + delta

%%time
residual_lstm = ResidualWrapper(
tf.keras.Sequential([
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(
num_features,
# The predicted deltas should start small.
# Therefore, initialize the output layer with zeros.
kernel_initializer=tf.initializers.zeros())
]))

history = compile_and_fit(residual_lstm, wide_window)

IPython.display.clear_output()
val_performance['Residual LSTM'] = residual_lstm.evaluate(wide_window.val)
performance['Residual LSTM'] = residual_lstm.evaluate(wide_window.test,
verbose=0)
print()

438/438 [==============================] - 1s 3ms/step - loss: 0.0621 -


mean_absolute_error: 0.1180

CPU times: user 1min 54s, sys: 21.4 s, total: 2min 15s
Wall time: 51.3 s

Performance

Here is the overall performance for these multi-output models.

x = np.arange(len(performance))
width = 0.3

metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in val_performance.values()]
test_mae = [v[metric_index] for v in performance.values()]

plt.bar(x - 0.17, val_mae, width, label='Validation')


plt.bar(x + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=performance.keys(),
rotation=45)
plt.ylabel('MAE (average over all outputs)')
_ = plt.legend()

for name, value in performance.items():


print(f'{name:15s}: {value[1]:0.4f}')

Baseline : 0.1638
Dense : 0.1319
LSTM : 0.1217
Residual LSTM : 0.1193

The above performances are averaged across all model outputs.


Multi-step models

Both the single-output and multiple-output models in the previous sections


made single time step predictions, one hour into the future.

This section looks at how to expand these models to make multiple time step
predictions.

In a multi-step prediction, the model needs to learn to predict a range of future


values. Thus, unlike a single step model, where only a single future point is
predicted, a multi-step model predicts a sequence of the future values.

There are two rough approaches to this:

1. Single shot predictions where the entire time series is predicted at once.
2. Autoregressive predictions where the model only makes single step
predictions and its output is fed back as its input.

In this section all the models will predict all the features across all output time
steps.

For the multi-step model, the training data again consists of hourly samples.
However, here, the models will learn to predict 24 hours into the future, given 24
hours of the past.

Here is a Window object that generates these slices from the dataset:

OUT_STEPS = 24
multi_window = WindowGenerator(input_width=24,
label_width=OUT_STEPS,
shift=OUT_STEPS)

multi_window.plot()
multi_window

Total window size: 48


Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23]
Label indices: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
45 46 47]
Label column name(s): None
Baselines

A simple baseline for this task is to repeat the last input time step for the required
number of output time steps:
class MultiStepLastBaseline(tf.keras.Model):
def call(self, inputs):
return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1])

last_baseline = MultiStepLastBaseline()
last_baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

multi_val_performance = {}
multi_performance = {}

multi_val_performance['Last'] = last_baseline.evaluate(multi_window.val)
multi_performance['Last'] = last_baseline.evaluate(multi_window.test, verbose=0)
multi_window.plot(last_baseline)

437/437 [==============================] - 1s 2ms/step - loss: 0.6285 -


mean_absolute_error: 0.5007
Since this task is to predict 24 hours into the future, given 24 hours of the past,
another simple approach is to repeat the previous day, assuming tomorrow will be
similar:

class RepeatBaseline(tf.keras.Model):
def call(self, inputs):
return inputs

repeat_baseline = RepeatBaseline()
repeat_baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

multi_val_performance['Repeat'] = repeat_baseline.evaluate(multi_window.val)
multi_performance['Repeat'] = repeat_baseline.evaluate(multi_window.test,
verbose=0)
multi_window.plot(repeat_baseline)

437/437 [==============================] - 1s 2ms/step - loss: 0.4270 -


mean_absolute_error: 0.3959

Single-shot models

One high-level approach to this problem is to use a "single-shot" model, where the
model makes the entire sequence prediction in a single step.
This can be implemented efficiently as
a tf.keras.layers.Dense with OUT_STEPS*features output units. The model just
needs to reshape that output to the required (OUTPUT_STEPS, features).

Linear

A simple linear model based on the last input time step does better than either
baseline, but is underpowered. The model needs to predict OUTPUT_STEPS time
steps, from a single input time step with a linear projection. It can only capture a
low-dimensional slice of the behavior, likely based mainly on the time of day and
time of year.

multi_linear_model = tf.keras.Sequential([
# Take the last time-step.
# Shape [batch, time, features] => [batch, 1, features]
tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
# Shape => [batch, 1, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_linear_model, multi_window)

IPython.display.clear_output()
multi_val_performance['Linear'] =
multi_linear_model.evaluate(multi_window.val)
multi_performance['Linear'] = multi_linear_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(multi_linear_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2550 -


mean_absolute_error: 0.3046

Dense

Adding a tf.keras.layers.Dense between the input and output gives the linear model
more power, but is still only based on a single input time step.
multi_dense_model = tf.keras.Sequential([
# Take the last time step.
# Shape [batch, time, features] => [batch, 1, features]
tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
# Shape => [batch, 1, dense_units]
tf.keras.layers.Dense(512, activation='relu'),
# Shape => [batch, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_dense_model, multi_window)

IPython.display.clear_output()
multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val)
multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(multi_dense_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2183 -


mean_absolute_error: 0.2804
CNN

A convolutional model makes predictions based on a fixed-width history, which


may lead to better performance than the dense model since it can see how things
are changing over time:
CONV_WIDTH = 3
multi_conv_model = tf.keras.Sequential([
# Shape [batch, time, features] => [batch, CONV_WIDTH, features]
tf.keras.layers.Lambda(lambda x: x[:, -CONV_WIDTH:, :]),
# Shape => [batch, 1, conv_units]
tf.keras.layers.Conv1D(256, activation='relu', kernel_size=(CONV_WIDTH)),
# Shape => [batch, 1, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_conv_model, multi_window)

IPython.display.clear_output()

multi_val_performance['Conv'] = multi_conv_model.evaluate(multi_window.val)
multi_performance['Conv'] = multi_conv_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(multi_conv_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2156 -


mean_absolute_error: 0.2824

RNN

A recurrent model can learn to use a long history of inputs, if it's relevant to the
predictions the model is making. Here the model will accumulate internal state for
24 hours, before making a single prediction for the next 24 hours.

In this single-shot format, the LSTM only needs to produce an output at the last
time step, so set return_sequences=False in tf.keras.layers.LSTM.
multi_lstm_model = tf.keras.Sequential([
# Shape [batch, time, features] => [batch, lstm_units].
# Adding more `lstm_units` just overfits more quickly.
tf.keras.layers.LSTM(32, return_sequences=False),
# Shape => [batch, out_steps*features].
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features].
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_lstm_model, multi_window)

IPython.display.clear_output()

multi_val_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.val)
multi_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(multi_lstm_model)
437/437 [==============================] - 1s 3ms/step - loss: 0.2145 -
mean_absolute_error: 0.2844

Advanced: Autoregressive model

The above models all predict the entire output sequence in a single step.

In some cases it may be helpful for the model to decompose this prediction into
individual time steps. Then, each model's output can be fed back into itself at each
step and predictions can be made conditioned on the previous one, like in the
classic Generating Sequences With Recurrent Neural Networks.

One clear advantage to this style of model is that it can be set up to produce output
with a varying length.

You could take any of the single-step multi-output models trained in the first half
of this tutorial and run in an autoregressive feedback loop, but here you'll focus on
building a model that's been explicitly trained to do that.
RNN

This tutorial only builds an autoregressive RNN model, but this pattern could be
applied to any model that was designed to output a single time step.

The model will have the same basic form as the single-step LSTM models from
earlier: a tf.keras.layers.LSTM layer followed by a tf.keras.layers.Dense layer that
converts the LSTM layer's outputs to model predictions.

A tf.keras.layers.LSTM is a tf.keras.layers.LSTMCell wrapped in the higher


level tf.keras.layers.RNN that manages the state and sequence results for you
(Check out the Recurrent Neural Networks (RNN) with Keras guide for details).

In this case, the model has to manually manage the inputs for each step, so it
uses tf.keras.layers.LSTMCell directly for the lower level, single time step
interface.

class FeedBack(tf.keras.Model):
def __init__(self, units, out_steps):
super().__init__()
self.out_steps = out_steps
self.units = units
self.lstm_cell = tf.keras.layers.LSTMCell(units)
# Also wrap the LSTMCell in an RNN to simplify the `warmup` method.
self.lstm_rnn = tf.keras.layers.RNN(self.lstm_cell, return_state=True)
self.dense = tf.keras.layers.Dense(num_features)

feedback_model = FeedBack(units=32, out_steps=OUT_STEPS)

The first method this model needs is a warmup method to initialize its internal state
based on the inputs. Once trained, this state will capture the relevant parts of the
input history. This is equivalent to the single-step LSTM model from earlier:

def warmup(self, inputs):


# inputs.shape => (batch, time, features)
# x.shape => (batch, lstm_units)
x, *state = self.lstm_rnn(inputs)

# predictions.shape => (batch, features)


prediction = self.dense(x)
return prediction, state

FeedBack.warmup = warmup

This method returns a single time-step prediction and the internal state of
the LSTM:

prediction, state = feedback_model.warmup(multi_window.example[0])


prediction.shape

TensorShape([32, 19])

With the RNN's state, and an initial prediction you can now continue iterating the
model feeding the predictions at each step back as the input.

The simplest approach for collecting the output predictions is to use a Python list
and a tf.stack after the loop.

Note: Stacking a Python list like this only works with eager-execution,
using Model.compile(..., run_eagerly=True) for training, or with a fixed length
output. For a dynamic output length, you would need to use
a tf.TensorArray instead of a Python list, and tf.range instead of the
Python range.

def call(self, inputs, training=None):


# Use a TensorArray to capture dynamically unrolled outputs.
predictions = []
# Initialize the LSTM state.
prediction, state = self.warmup(inputs)

# Insert the first prediction.


predictions.append(prediction)

# Run the rest of the prediction steps.


for n in range(1, self.out_steps):
# Use the last prediction as input.
x = prediction
# Execute one lstm step.
x, state = self.lstm_cell(x, states=state,
training=training)
# Convert the lstm output to a prediction.
prediction = self.dense(x)
# Add the prediction to the output.
predictions.append(prediction)

# predictions.shape => (time, batch, features)


predictions = tf.stack(predictions)
# predictions.shape => (batch, time, features)
predictions = tf.transpose(predictions, [1, 0, 2])
return predictions

FeedBack.call = call

Test run this model on the example inputs:

print('Output shape (batch, time, features): ',


feedback_model(multi_window.example[0]).shape)

Output shape (batch, time, features): (32, 24, 19)


Now, train the model:

history = compile_and_fit(feedback_model, multi_window)

IPython.display.clear_output()

multi_val_performance['AR LSTM'] =
feedback_model.evaluate(multi_window.val)
multi_performance['AR LSTM'] = feedback_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(feedback_model)

437/437 [==============================] - 3s 6ms/step - loss: 0.2303 -


mean_absolute_error: 0.3055

Performance

There are clearly diminishing returns as a function of model complexity on this


problem:
x = np.arange(len(multi_performance))
width = 0.3

metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in multi_val_performance.values()]
test_mae = [v[metric_index] for v in multi_performance.values()]

plt.bar(x - 0.17, val_mae, width, label='Validation')


plt.bar(x + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=multi_performance.keys(),
rotation=45)
plt.ylabel(f'MAE (average over all times and outputs)')
_ = plt.legend()
The metrics for the multi-output models in the first half of this tutorial show the
performance averaged across all output features. These performances are similar
but also averaged across output time steps.

for name, value in multi_performance.items():


print(f'{name:8s}: {value[1]:0.4f}')

Last : 0.5157
Repeat : 0.3774
Linear : 0.2979
Dense : 0.2762
Conv : 0.2765
LSTM : 0.2772
AR LSTM : 0.2969

The gains achieved going from a dense model to convolutional and recurrent
models are only a few percent (if any), and the autoregressive model performed
clearly worse. So these more complex approaches may not be worth while
on this problem, but there was no way to know without trying, and these models
could be helpful for your problem.

.
References
Duke University, Summary of Rules for Identifying ARIMA Models
Global Temperature Time Series Data
Holmes, Scheuerell, & Ward, Applied Time Series Analysis
Hyndman & Athanasopoulos, Forecasting Principles and Practice
Khan, ARIMA Model for Forescating - Example in R
Towards Data Science, The Complete Guide to Time Sereis Analysis and
Forecasting
‫السًآآرة الذاتًآآآة الداصة‬
‫بالدكتور ‪ /‬أحمد فوزى حسن غنًم‬

‫بًانآات شدصًآة‬

‫‪ :‬أحمد فوزى حسآن غنًم‬ ‫اآلسآم‬


‫‪ :‬مصآرى‬ ‫الجنسًآة‬
‫‪1969/4/ 16 :‬م‬ ‫تارًخ المًآبلد‬
‫‪ :‬حلوان‪ -‬القاهرة‪ -‬مصر‬ ‫محل المًآبلد‬
‫ذكر‬ ‫‪:‬‬ ‫النآوع‬
‫الحالة اإلجتماػًة ‪ :‬متزوج‬
‫الؼنوان اإلكتآروني ‪[email protected] :‬‬

‫الؼنوان البرًدى للؼمل‪ :‬كسم الرًاضًات – كلًة الؼلوم –جامؼة حلوان‪-‬ػًن حلوان‪-‬صندوق برًد‬
‫‪-11795‬القاهرة – مصر‬

‫‪25552468‬‬ ‫الكلًة‬
‫‪25552468‬‬ ‫الفاكس‬

‫بكالورًوس ػلوم شؼبة الرًاضًات ‪ 1991‬جامؼة حلوان‪.‬‬ ‫المؤهبلت الؼلمًة ‪:‬‬


‫ماجستًر في الؼلوم تدصص رًاضًات تطبًقًة ‪ 1998‬جامؼة حلوان‪.‬‬
‫دكتوراة الفلسفة في الؼلوم في الرًاضًات تدصص رًاضًات تطبًقًة ‪2003‬‬
‫جامؼة حلوان‪.‬‬

‫من ‪ 1992/1/5‬مؼًد بقسم الرًاضًات ‪ -‬كلًة الؼلوم ‪-‬جامؼة حلوان‪.‬‬ ‫‪:‬‬ ‫التدرج الوظًفي‬
‫من ‪ 1998/3/19‬مدرس مساػد بقسم الرًاضًات ‪-‬كلًة الؼلوم ‪-‬جامؼة حلوان‪.‬‬
‫من ‪ 2003/3/30‬مدرس بقسم الرًاضًات ‪ -‬كلًة الؼلوم ‪-‬جامؼة حلوان‪.‬‬
‫وحتي اآلن‪.‬‬
‫من ‪ 9/2005 – 2/2005‬أستاذ مساػد بكلًة االداب والؼلوم جامؼة سبها – لًبًا‬
‫من ‪ 2018/6/4-2007/11/9‬أستاذ مساػد بكلًة المجتمغ – الدرج‪ -‬جامؼة‬
‫األمًر سطام بن ػبد الؼزًز ‪ -‬السؼودًة‬

‫المقررات التدرًسًة ‪:‬‬


1‫احتماالت واحصاء رًاضي‬
‫احتماالت واحصاء رًاضي متقدم‬
‫جبر وهندسة تحلًلًة‬
)3-2-1 ‫التفاضل والتكامل *تحلًل رًاضي‬
‫المؼادالت التفاضلًة الؼادًة‬
‫المؼادالت التفاضلًة الجزئًة‬
‫دوال داصة‬
‫منطق وأسس الرًاضًات‬
‫التحلًل الفورًرى وتحوًبلت الببلس‬
)3-2-1 ‫المًكانًكا الكبلسًكًة *إ ستاتًكا و دًنامًكا‬
‫المًكانًكا التحلًلًة‬
‫التحلًل الؼددى‬
‫الرًاضًات الؼامة‬
‫الكهربًة والمػناطًسًة‬-‫ نظرًة المرونة وتطبًقاتها‬-‫مًكانًكا الموائغ‬

: ‫ػضوًة الجمؼًات‬

‫ػضو بنقابة المهن الؼلمًة‬


‫ جامؼة حلوان‬-‫ػضو بمشروع التؼلًم ػن بؼد – كلًة الؼلوم‬
‫ جامؼة حلوان‬-‫ػضو بمشروع اإلػتماد والجودة – كلًة الؼلوم‬

: ‫األبحاث المنشورة‬

1-Emad M. Abo El- Dahab and Ahmed F. Ghonaim “Convective heat transfer in
an electrically conducting micropolar fluid at a stretching surface with uniform
free stream”; Journal of Applied Mathematics and Computation, 137(2003)
323-326.

2- Emad M. Abo El- Dahab and Ahmed F. Ghonaim “ Radiation effect on


Convective heat transfer in an electrically conducting micropolar fluid at a
stretching surface with variable viscosity”, Horizons in World Physics,
240(2003) 83- 101.

3- Emad M. Abo El- Dahab and Ahmed F. Ghonaim “Radiation effect on heat
transfer of a micropolar fluid through a porous medium”, Accepted at
Applied Mathematics and Computation

5- Ahmed F.Ghonaim “Effect Of the Thermal Dispertion On Hydromagnetic


Micropolar Flow And Heat Transfer Past A Continuously Moving Porous
Boundary With Temperature Dependent Viscosity", accepted at ICMTD07
Conference, 2007
6- Ahmed F.Ghonaim “convective heat transfer in an electrically conducting
micropolar fluid at a stretching surface with uniform free stream", submitted to
journal of mathematical and physical society of Egypt 2009.

7- Ahmed F.Ghonaim “Radiation effect on convective heat transfer in an


electrically conducting micropolar fluid at a stretching surface with variable
‫‪viscosity and thermal conductivity”, submitted to journal of mathematical and‬‬
‫‪physical society of Egypt 2009.‬‬

‫أبحاث مشاركة بالتحلًل االحصائي‪:‬‬


‫‪1- Mohamed Farouk, Magdy Mustafa, Zyed, “comparative study of body‬‬
‫‪mass index effect on physical therapy results after total knee Arthroplasty‬‬
‫‪during hospitalization”, Med. J. Cairo univ., Vol.82, No. 1, Dec, 1-8,2014‬‬

‫‪2- Mohamed Farouk,Shereen elwardany, Reham AbdAlreheem,”Isolated‬‬


‫‪Lumber stabilization exercises versus dynamic lumber strengthening‬‬
‫‪exercises in patients with spondylolisthesis”, Med. J. Cairo univ., Vol.82,‬‬
‫‪No. 2, June, 2014‬‬

‫دورات تدرًبًة ‪:‬‬

‫‪ -1‬بجامؼة األمًر سطام بن ػبد الؼزًز‬

‫التدطًط االستراتًجي في المؤسسات األكادًمًة‬


‫التصمًم التؼلًمي للمقررات االلكترونًة‬
‫استددام السبورة الذكًة في التدرًس‬
‫مهارات استددام الحاسب االلي في التدرًس‬
‫المكتبة الركمًة السؼودًة‬
‫مؤشرات اآلداء والمقارنات المرجؼًة‬
‫إدارة التؼلًم ًاستددام نظام الببلك بورد‬
‫المراجغ الدارجًة للمؤسسات التؼلًمًة‬
‫تصنًؼ المجبلت الؼلمًة‬
‫التحلًل االحصائي والنظام األكادًمي الجامؼي‬
‫المكتبة الركمًة السؼودًة‬

‫‪ -2‬بجامؼة حلوان‬
‫مهارات التفكًر‬
‫إستددام التكنولوجًا في التدرًس‬
‫تصمًم المقرر الجامؼي‬
‫الجوانب القانونًة‬
‫الجوانب المالًة‬
‫اكتصادًات البحث الؼلمي‬
‫اإلتجاهات الحدًثة في التدرًس‬
‫اإلػتماد والجودة‬
‫التدرًس لؤلػداد الكبًرة والتدرًس المصػر‬

You might also like