Time Series Analysis Book
Time Series Analysis Book
Faculty of science
Mathematics department
Time series
Prepared by
Dr . Ahmed FAwzy
الناشر :جهاز النصر والتوزًع لمكتاب الحامعي -جامعة حموان
2023
Introduction
Time Series Analysis is a way of studying the characteristics of the response variable
concerning time as the independent variable. To estimate the target variable
in predicting or forecasting, use the time variable as the reference point. TSA
represents a series of time-based orders, it would be Years, Months, Weeks, Days,
Horus, Minutes, and Seconds. It is an observation from the sequence of discrete time
of successive intervals. Some real-world application of TSA includes weather
forecasting models, stock market predictions, signal processing, and control
systems. Since TSA involves producing the set of information in a particular
sequence, this makes it distinct from spatial and other analyses. We could predict
the future using AR, MA, ARMA, and ARIMA models. In this article, we will be
decoding time series analysis for you.
Learning Objectives
• To understand how time series works and what factors affect a certain
variable(s) at different points in time.
• Time series analysis will provide the consequences and insights of the given
dataset’s features that change over time.
• Supporting to derive the predicting the future values of the time series
variable.
• Assumptions: There is only one assumption in TSA, which is “stationary,”
which means that the origin of time does not affect the properties of the
process under the statistical factor.
With the help of “Time Series,” we can prepare numerous time-based analyses and
results.
• Trend: In which there is no fixed interval and any divergence within the given
dataset is a continuous timeline. The trend would be Negative or Positive or
Null Trend
• Seasonality: In which regular or fixed interval shifts within the dataset in a
continuous timeline. Would be bell curve or saw tooth
• Cyclical: In which there is no fixed interval, uncertainty in movement and its
pattern
• Irregularity: Unexpected situations/events/scenarios and spikes in a short
time span.
• Similar to other models, the missing values are not supported by TSA
• The data points must be linear in their relationship.
• Data transformations are mandatory, so they are a little expensive.
• Models mostly work on Uni-variate data.
• The mean value of them should be completely constant in the data during the
analysis.
• The variance should be constant with respect to the time-frame
• Covariance measures the relationship between two variables.
Detrending
It involves removing the trend effects from the given dataset and showing only the
differences in values from the trend. It always allows cyclical patterns to be
identified.
Differencing
This is a simple transformation of the series into a new time series, which we use to
remove the series dependence on time and stabilize the mean of the time series, so
trend and seasonality are reduced during this transformation.
• Yt= Yt – Yt-1
• Yt=Value with time
Transformation
This includes three different methods they are Power Transform, Square Root, and
Log Transfer. The most commonly used one is Log Transfer.
The Moving Average (MA) (or) Rolling Mean: The value of MA is calculated by
taking average data of the time-series within k periods.
import pandas as pd
from matplotlib import pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
df_temperature = pd.read_csv('temperature_TSA.csv', encoding='utf-8')
df_temperature.head()
Output
Code
df_temperature.info()
Output
Code
# Grean = Avg Air Temp, RED = 10 yrs, ORANG colors for the line plot
colors = ['green', 'red', 'orange']
# Line plot
df_temperature.plot(color=colors, linewidth=3, figsize=(12,6))
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.legend(labels =['Average air temperature', '10-years SMA', '20-years SMA'],
fontsize=14)
plt.title('The yearly average air temperature in city', fontsize=20)
plt.xlabel('Year', fontsize=16)
plt.ylabel('Temperature [°C]', fontsize=16)
Output
Cumulative Moving Average (CMA)
The CMA is the unweighted mean of past values till the current time.
Code
α –>Smoothing Factor.
Let’s apply the exponential moving averages with a smoothing factor of 0.1 and 0.3
in the given dataset.
Code
# green - Avg Air Temp, red- smoothing factor - 0.1, yellow - smoothing factor -
0.3
colors = ['green', 'red', 'yellow']
df_temperature[['average_temperature', 'EMA_0.1', 'EMA_0.3']].plot(color=colors,
linewidth=3, figsize=(12,6), alpha=0.8)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.legend(labels=['Average air temperature', 'EMA - alpha=0.1', 'EMA -
alpha=0.3'], fontsize=14)
plt.title('The yearly average air temperature in city', fontsize=20)
plt.xlabel('Year', fontsize=16)
plt.ylabel('Temperature [°C]', fontsize=16)
Output
Time Series Analysis in Data Science and Machine Learning
When dealing with TSA in Data Science and Machine Learning, there are multiple
model options are available. In which the Autoregressive–Moving-Average
(ARMA) models with [p, d, and q].
Before we get to know about Arima, first, you should understand the below terms
better.
plot_acf(df_temperature)
plt.show()
plot_acf(df_temperature, lags=30)
plt.show()
Output
Observation
The previous temperature influences the current temperature, but the significance of
that influence decreases and slightly increases from the above visualization along
with the temperature with regular time intervals.
Types of Auto-Correlation
Plot drop instantly Plot drop instantly You wouldn’t perform any model
Remember that both ACF and PACF require stationary time series for analysis.
Key Parameters
• p=past values
• Yt=Function of different past values
• Ert=errors in time
• C=intercept
Lets’s check whether the given data set or time series is random or not.
Code
#import libraries
from matplotlib import pyplot
from statsmodels.tsa.ar_model import AutoReg
from sklearn.metrics import mean_squared_error
from math import sqrt
# load csv as dataset
#series = read_csv('daily-min-temperatures.csv', header=0, index_col=0,
parse_dates=True, squeeze=True)
# split dataset for test and training
X = df_temperature.values
train, test = X[1:len(X)-7], X[len(X)-7:]
# train autoregression
model = AutoReg(train, lags=20)
model_fit = model.fit()
print('Coefficients: %s' % model_fit.params)
# Predictions
predictions = model_fit.predict(start=len(train), end=len(train)+len(test)-1,
dynamic=False)
for i in range(len(predictions)):
print('predicted=%f, expected=%f' % (predictions[i], test[i]))
rmse = sqrt(mean_squared_error(test, predictions))
print('Test RMSE: %.3f' % rmse)
# plot results
pyplot.plot(test)
pyplot.plot(predictions, color='red')
pyplot.show()
Output
predicted=15.893972, expected=16.275000
predicted=15.917959, expected=16.600000
predicted=15.812741, expected=16.475000
predicted=15.787555, expected=16.375000
predicted=16.023780, expected=16.283333
predicted=15.940271, expected=16.525000
predicted=15.831538, expected=16.758333
Test RMSE: 0.617
Observation
Expected (blue) Against Predicted (red). The forecast looks good on the 4th and the
deviation on the 6th day.
Implementation of Moving Average (Weights – Simple Moving Average)
Code
import numpy as np
alpha= 0.3
n = 10
w_sma = np.repeat(1/n, n)
colors = ['green', 'yellow']
# weights - exponential moving average alpha=0.3 adjust=False
w_ema = [(1-ALPHA)**i if i==N-1 else alpha*(1-alpha)**i for i in range(n)]
pd.DataFrame({'w_sma': w_sma, 'w_ema': w_ema}).plot(color=colors, kind='bar',
figsize=(8,5))
plt.xticks([])
plt.yticks(fontsize=10)
plt.legend(labels=['Simple moving average', 'Exponential moving average
(α=0.3)'], fontsize=10)
# title and labels
plt.title('Moving Average Weights', fontsize=10)
plt.ylabel('Weights', fontsize=10)
Output
Understanding ARMA and ARIMA
ARMA is a combination of the Auto-Regressive and Moving Average models for
forecasting. This model provides a weakly stationary stochastic process in terms of
two polynomials, one for the Auto-Regressive and the second for the Moving
Average.
ARMA is best for predicting stationary series. ARIMA was thus developed to
support both stationary as well as non-stationary series.
AR+I+MA= ARIMA
Understand the signature of ARIMA
Code
Code
results_ARIMA.forecast(3)[0]
Output
results_ARIMA.plot_predict(start=200)
plt.show()
Output
Process Flow (Re-Gap)
In recent years, the use of Deep Learning for Time Series Analysis and Forecasting
has increased to resolve problem statements that couldn’t be handled using Machine
Learning techniques. Let’s discuss this briefly.
Recurrent Neural Networks (RNN) is the most traditional and accepted architecture
fitment for Time-Series forecasting-based problems.
• Input
• Hidden
• Output
Each layer has equal weight, and every neuron has to be assigned to fixed time steps.
Do remember that every one of them is fully connected with a hidden layer (Input
and Output) with the same time steps, and the hidden layers are forwarded and time-
dependent in direction.
Components of RNN
nternally weight matrix W is formed by the hidden layer neurons of time t-1 and t+1.
Following this, the hidden layer with to the output vector y(t) of time t by
a V (weight matrix); all the weight matrices U, W, and V are constant for each time
step.
Advantages of RNN
• It has the special feature that it remembers every piece of information, so RNN
is much useful for time series prediction
• Perfect for creating complex patterns from the input time series dataset
• Fast in prediction/forecasting
• Not affected by missing values, so the cleansing process can be limited.
Disadvantages of RNN
Conclusion
A time series is constructed by data that is measured over time at evenly spaced
intervals. I hope this comprehensive guide has helped you all understand the time
series, its flow, and how it works. Although the TSA is widely used to handle data
science problems, it has certain limitations, such as not supporting missing values.
Note that the data points must be linear in their relationship for Time Series Analysis
to be done.
Ready to dive deeper into Time Series Analysis? Enhance your skills with Analytics
Vidhya’s comprehensive courses and unlock new possibilities in your data science
careers. Check out our courses today!
Key Takeaways
A. The four main components of time series are Trend, Seasonality, Cyclical, and
Irregularity.
Time series data often arise when monitoring industrial processes or tracking
corporate business metrics. The essential difference between modeling data via
time series methods or using the process monitoring methods discussed earlier in
this chapter is the following:
Time series analysis accounts for the fact that data points taken over time may
have an internal structure (such as autocorrelation, trend or seasonal variation)
that should be accounted for.
This section will give a brief overview of some of the more widely used techniques
in the rich and rapidly growing field of time series modeling and analysis.
• Economic Forecasting
• Sales Forecasting
• Budgetary Analysis
• Stock Market Analysis
• Yield Projections
• Process and Quality Control
• Inventory Studies
• Workload Projections
• Utility Studies
• Census Analysis
There are many methods used to model and forecast time series
Techniques: The fitting of time series models can be an ambitious
undertaking. There are many methods of model fitting including
the following:
The analysis of time series where the data are not collected in
equal time increments is beyond the scope of this handbook.
Every day, humans make passive predictions when performing tasks such as
crossing a road, where they estimate the speed of cars and their distance from them,
or catching a ball by guessing its velocity and positioning their hands accordingly.
These skills are gained through experience and practice. However, predicting
complex phenomena like the weather or the economy can be difficult due to the
multitude of variables involved. Time and series forecasting is used in such
situations, relying on historical data and mathematical models to make predictions
about future trends and patterns. In this article we will see the example of
forecasting with mathematical concept using airlines dataset.
Part 1: Mathematical Concepts
In the context of the time series forecasting algorithm used in this article, instead of
manually calculating the slope and intercept of the line, the algorithm uses a neural
network with LSTM layers to learn the underlying patterns and relationships in the
time series data. The neural network is trained on a portion of the data and then
used to make predictions for the remaining portion. In this algorithm, the prediction
for the next time step is based on the previous n_inputs time steps, which is similar
to the concept of using y(t) to predict y(T+1) in the linear regression example.
However, instead of using a simple linear equation, the prediction in this algorithm
is generated using the activation function of the LSTM layer. The activation
function allows the model to capture non-linear relationships in the data, making it
more effective in capturing complex patterns in time series data.
Activation Function
photo by @learnwithutsav
The activation function used in the LSTM model is the rectified linear unit (ReLU)
activation function. This activation function is commonly used in deep learning
models because of its simplicity and effectiveness in dealing with the vanishing
gradient problem. In the LSTM model, the ReLU activation function is applied to
the output of each LSTM unit to introduce non-linearity in the model and allow it to
learn complex patterns in the data. The ReLU function has a simple thresholding
behavior where any negative input is mapped to zero and any positive input is
passed through unchanged, making it computationally efficient.
Part 2 : Implementation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('airline-passengers.csv', index_col='Month', parse_dates=True)
df.index.freq = 'MS'
df.shape
df.columns
plt.figure(figsize=(20, 4))
plt.plot(df.Passengers, linewidth=2)
plt.show()
The code imports three important libraries: numpy, pandas, and matplotlib. The
pandas library is used to read in the ‘airline-passengers.csv’ file and set the ‘Month’
column as the index, which allows the data to be analyzed over time. The code then
uses the matplotlib library to create a line plot showing the number of airline
passengers over time. Finally, the plot is displayed using the ‘plt.show’ function.
This code is useful for anyone interested in analyzing time series data, and it
demonstrates how to use pandas and matplotlib to visualize trends in data.
nobs = 12
df_train = df.iloc[:-nobs]
df_test = df.iloc[-nobs:]
df_train.shape
df_test.shape
This code creates two new data frames ‘df_train’ and ‘df_test’ by splitting an
existing time series data frame ‘df’ into training and testing sets. The ‘nobs’
variable is set to 12, which means that the last 12 observations of ‘df’ will be used
for testing, while the rest of the data will be used for training. The training set is
stored in ‘df_train’ and consists of all rows in ‘df’ except for the last 12 rows, while
the testing set is stored in ‘df_test’ and consists of only the last 12 rows of ‘df’. The
‘shape’ attribute is then used to print the number of rows and columns in each data
frame, which confirms that the splitting was done correctly. This code is useful for
preparing time series data for modeling and testing purposes by splitting it into two
sets.
Model Architecture
Photo by @learnwithutsav
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(df_train)
scaled_train = scaler.transform(df_train)
scaled_test = scaler.transform(df_test)
n_inputs = 12
n_features = 1
generator = TimeseriesGenerator(scaled_train, scaled_train, length = n_inputs,
batch_size =1)
for i in range(len(generator)):
X, y = generator[i]
print(f' \n {X.flatten()} and {y}')
This code snippet demonstrates how to use the ‘TimeseriesGenerator’ class from
Keras and the ‘MinMaxScaler’ class from scikit-learn to generate input and output
arrays for a time series forecasting model. The code first creates an instance of the
‘MinMaxScaler’ class and fits it to the training data set (‘df_train’) in order to scale
the data. The scaled data is then stored in ‘scaled_train’ and ‘scaled_test’ data
frames. The number of time steps (‘n_inputs’) is set to 12, and the number of
features (‘n_features’) is set to 1. A ‘TimeseriesGenerator’ object is created with the
‘scaled_train’ data and a window length of ‘n_inputs’ and a batch size of 1. Finally,
a loop is used to iterate over the ‘generator’ object and print out the input and
output arrays for each time step. The ‘X’ and ‘y’ variables represent the input and
output arrays for each time step, respectively. The ‘flatten()’ method is used to
convert the input array into a 1D array for easier printing. Overall, this code is
useful for preparing time series data for forecasting models using a sliding window
approach.
X.shape
This code returns the shape of an array or matrix ‘X’. The ‘shape’ attribute is a
property of NumPy arrays and returns a tuple representing the dimensions of the
array. The code does not provide any additional context, so it is unclear what the
shape of ‘X’ is. The output will be in the format (rows, columns).
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape = (n_inputs, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.summary()
This code demonstrates how to create an LSTM neural network model for time
series forecasting using Keras. Firstly, the necessary Keras classes are imported,
including ‘Sequential’, ‘Dense’, and ‘LSTM’. The model is created as a
‘Sequential’ object and an LSTM layer is added with 200 neurons, the ‘relu’
activation function, and an input shape defined by ‘n_inputs’ and ‘n_features’. The
LSTM layer output is then passed to a ‘Dense’ layer with a single output neuron.
The model is compiled with the ‘adam’ optimizer and the mean squared error
(‘mse’) loss function. The ‘summary()’ method is used to display a summary of the
architecture, including the number of parameters and the shapes of the input and
output tensors for each layer. This code can be useful for creating an LSTM model
for time series forecasting, as it provides an easy-to-follow example that can be
adapted to different data sets and forecasting problems.
Training Phase
This code trains the LSTM neural network model using the ‘fit()’ method in Keras
for 50 epochs. The ‘TimeseriesGenerator’ object generates batches of input/output
pairs for the model to learn from. The ‘fit()’ method updates the model parameters
using backpropagation based on the loss function and optimizer defined during
model compilation. By training the model, it learns to make predictions on new,
unseen data based on patterns learned in the training data.
plt.plot(model.history.history['loss'])
last_train_batch = scaled_train[-12:]
last_train_batch
model.predict(last_train_batch)
This code uses the trained LSTM neural network model to make predictions on a
new data point. The last 12 data points from the training data are selected, scaled,
and reshaped into the appropriate format for the model. The ‘predict()’ method is
called on the model with the reshaped data as input, and the output is the predicted
value for the next time step in the time series. This is an essential step in using the
LSTM model for time series forecasting.
scaled_test[0]
This code prints the first element of the scaled test data array. The ‘scaled_test’
variable is a NumPy array of the test data that has been transformed using the
‘MinMaxScaler’ object. Printing the first element of this array shows the scaled
value for the first time step in the test data.
Forecasting
y_pred = []
first_batch = scaled_train[-n_inputs:]
current_batch = first_batch.reshape(1, n_inputs, n_features)
for i in range(len(scaled_test)):
batch = current_batch
pred = model.predict(batch)[0]
y_pred.append(pred)
current_batch = np.append(current_batch[:,1:, :], [[pred]], axis = 1)
y_pred
scaled_test
This code generates predictions for the test data using the trained LSTM model. It
uses a for loop to loop over each element in the scaled test data. In each iteration,
the current batch is used to make a prediction using the ‘predict()’ method of the
model. The predicted value is then added to the ‘y_pred’ list and the current batch is
updated. Finally, the ‘y_pred’ list is printed along with the ‘scaled_test’ data to
compare the predicted values with the actual values. This step is crucial in
evaluating the performance of the LSTM model on the test data.
df_test
y_pred_transformed = scaler.inverse_transform(y_pred)
y_pred_transformed = np.round(y_pred_transformed,0)
y_pred_final = y_pred_transformed.astype(int)
y_pred_final
This code transforms the predicted values generated in the previous step back to the
original scale using the ‘inverse_transform()’ method of the scaler object. The
transformed values are rounded to the nearest integer using the ‘round()’ function
and converted to integers using the ‘astype()’ method. The resulting array of
predicted values, ‘y_pred_final’, is printed to show the final predicted values for the
test data. This step is important for evaluating the accuracy of the LSTM model’s
predictions on the original scale of the data.
df_test.values, y_pred_final
df_test['Predictions'] = y_pred_final
df_test
The code above shows the predicted values generated by the LSTM model being
added to the original test dataset. First, the ‘values’ attribute is used to extract the
values of the ‘df_test’ dataframe, which are then paired with the predicted values
‘y_pred_final’. Then, a new column called ‘Predictions’ is added to the ‘df_test’
dataframe to store the predicted values. Finally, the ‘df_test’ dataframe is printed
with the newly added ‘Predictions’ column. This step is important to visually
compare the actual values of the test dataset with the predicted values and evaluate
the accuracy of the model.
plt.figure(figsize=(15, 6))
plt.plot(df_train.index, df_train.Passengers, linewidth=2, color='black', label='Train
Values')
plt.plot(df_test.index, df_test.Passengers, linewidth=2, color='green', label='True
Values')
plt.plot(df_test.index, df_test.Predictions, linewidth=2, color='red', label='Predicted
Values')
plt.legend()
plt.show()
This code block is generating a plot using the matplotlib library. It first sets the
figure size, and then plots the training data as a black line, the true test values as a
green line, and the predicted test values as a red line. It also adds a legend to the
plot and displays it using the show() method.
Mean Squared Error
The mean squared error (MSE) is a measure of how close a regression line is to
a set of points. It’s calculated by taking the average of the squared differences
between the predicted and actual values. The square root of the MSE is known
as the root mean squared error (RMSE), which is a popular measure of the
accuracy of predictions. In this code block, the RMSE is calculated using
the mean_squared_error function from the sklearn.metrics module and
the sqrt function from the math module. The RMSE is used to evaluate the
accuracy of the LSTM model's predictions compared to the true values in the
test set.
sqrt(mean_squared_error(df_test.Passengers, df_test.Predictions))
This code calculates the root mean squared error (RMSE) between the actual
passenger values in the test set (df_test.Passengers) and the predicted passenger
values (df_test.Predictions). RMSE is a commonly used metric to evaluate the
performance of regression models. It measures the average distance between the
predicted values and the actual values, taking into account the square of the
differences between them. RMSE is a useful metric because it penalizes large errors
more heavily than small errors, making it a good indicator of the overall accuracy
of a model's predictions.
Twitter — https://twitter.com/utsavpoudel_
Write
Sign up
Sign In
Top highlight
Are you striving to build efficient, scalable, and resilient software systems? As a
software developer or senior developer, you must have come across the term
“microservices architecture.” This revolutionary approach to software development
has been adopted by many successful tech giants, such as Netflix, Amazon, and
Spotify. But, what exactly are microservices, and why should you care?
Microservices architecture is a software development technique that breaks down a
large application into smaller, manageable, and independent services. Each service
is responsible for a specific functionality and communicates with others through
well-defined APIs. This approach helps in achieving better scalability,
maintainability, and flexibility of software systems.
Did you know that 86% of developers reported increased productivity and faster
time to market when they embraced microservices? The secret behind this success
lies in understanding and implementing the right microservices patterns. These
patterns provide a solid foundation for designing and managing microservices-
based applications.
In this blog, we will dive into the top 12 microservices patterns that every software
engineer must know. By mastering these patterns, you will be well-equipped to
build powerful, fault-tolerant, and easily maintainable software systems. Are you
ready to level up your software development game? Let’s get started!
Are you tired of managing multiple entry points for your microservices? The API
Gateway pattern is here to save the day! Acting as a single entry point for all client
requests, the API Gateway simplifies access to your microservices, offering
seamless communication between clients and services.
Why should you care about the API Gateway? First, it helps in aggregating
responses from multiple microservices, reducing the number of round trips between
clients and services. This results in improved performance and user experience.
Second, it enables you to implement cross-cutting concerns such as authentication,
logging, and rate limiting at a single place, promoting consistency and reducing
redundancy.
Imagine the convenience of having a central hub that takes care of all these
responsibilities! According to a study by RapidAPI, 68% of developers who
adopted API Gateway reported improved security and simplified management of
their microservices.
Some popular API Gateway solutions include Amazon API Gateway, Kong, and
Azure API Management. These tools provide a range of features, such as caching,
throttling, and monitoring, to help you manage your microservices efficiently.
Are you struggling to keep track of your growing number of microservices? Worry
no more! The Service Discovery pattern is here to help you navigate the complex
world of microservices with ease. This pattern allows services to find each other
dynamically, ensuring smooth communication and reducing the need for manual
configuration.
Are you concerned about the ripple effect of failures in your microservices
architecture? Meet the Circuit Breaker pattern — your ultimate safeguard against
cascading failures. This pattern monitors for failures and prevents requests from
reaching a failing service, giving it time to recover and protecting the entire system
from collapse.
Circuit Breakers can be easily implemented using libraries like Netflix Hystrix and
Resilience4j. These libraries offer a range of features, such as fallback methods and
monitoring, to help you manage and recover from failures efficiently.
In essence, the Circuit Breaker pattern is a must-have for building resilient and
fault-tolerant microservices. By incorporating this pattern into your architecture,
you can effectively shield your system from the adverse effects of service failures.
Are you ready to fortify your microservices with the Circuit Breaker pattern?
The System Design Interview Roadmap
Decoding the Secrets of Successful System Design Interviews. This guide is a
comprehensive resource that prepares…
www.designgurus.io
Why should you consider the Load Balancing pattern? As your application grows,
uneven traffic distribution can lead to service degradation or even failure. Load
Balancing ensures that no single service becomes a bottleneck, resulting in
improved performance and reliability. In fact, 81% of developers who adopted
Load Balancing reported enhanced application responsiveness and reduced service
downtime.
Are you seeking ways to minimize the impact of service failures in your
microservices architecture? Look no further than the Bulkhead pattern! This pattern
isolates services and resources, ensuring that a failure in one service doesn’t bring
down your entire system.
Are you looking for ways to optimize the performance and scalability of your
microservices? The CQRS (Command Query Responsibility Segregation) pattern is
the answer! This pattern separates the read and write operations of your services,
allowing you to fine-tune each aspect independently for maximum efficiency.
Why should you consider the CQRS pattern? In traditional architectures, combining
read and write operations can lead to performance bottlenecks and increased
complexity. With CQRS, you can optimize each operation individually, resulting in
improved performance and easier maintenance. Studies show that 78% of
developers who adopted CQRS experienced enhanced system scalability and
responsiveness.
Implementing CQRS involves segregating your services into two distinct parts: one
for handling commands (write operations) and another for handling queries (read
operations). This separation allows you to apply different scaling, caching, and
database strategies for each operation type. Popular frameworks, such as Axon and
MediatR, offer built-in support for implementing the CQRS pattern.
Are you searching for a way to enhance the responsiveness and adaptability of your
microservices? The Event-Driven Architecture pattern is here to help! This pattern
leverages events to trigger actions in your services, enabling real-time
responsiveness and promoting loose coupling between services.
Design Gurus has most comprehensive list of courses on system design and coding
interviews. Take a look at Grokking Microservices Design Patterns to master
microservices design patterns.
Are you seeking ways to improve your microservices’ resilience in the face of
transient failures? The Retry pattern has got you covered!
Why should you adopt the Retry pattern? In a microservices ecosystem, transient
failures such as network hiccups or service timeouts are inevitable. The Retry
pattern enables your services to recover gracefully from these issues, enhancing
overall system stability.
The key to successful implementation lies in defining a suitable retry strategy. This
strategy should include factors like the maximum number of retries, delay between
retries, and any exponential backoff. Libraries like Polly, Resilience4j, and Spring
Retry offer built-in support for implementing the Retry pattern in your
microservices.
10. Backends for Frontends Pattern (BFF): Optimize User Experience with
Tailored Service Aggregation
Are you looking to deliver a seamless user experience across multiple platforms?
Look no further than the Backends for Frontends (BFF) pattern! This pattern
involves creating dedicated backend services for each frontend, ensuring optimal
performance and user experience tailored to each platform.
Why should you consider the BFF pattern? In a microservices architecture, a single
backend service might not cater to the diverse requirements of different frontends.
The BFF pattern enables you to customize your backend services for each platform,
enhancing performance and user experience. A study found that 82% of developers
who adopted the BFF pattern reported improved user satisfaction and reduced
development complexity.
To implement the BFF pattern, you create separate backend services for each
frontend (e.g., web, mobile, IoT), aggregating and adapting the data specifically for
each platform’s requirements. Tools like GraphQL, Apollo Server, and Express.js
can facilitate the creation of custom backend services for your frontends.
BFF Pattern
Why should you adopt the Strangler pattern? Migrating from a monolithic
architecture to microservices can be challenging and risky. The Strangler pattern
allows for incremental replacement, minimizing downtime and risk while
maintaining business continuity. Studies reveal that 81% of developers who used
the Strangler pattern experienced a smoother migration with fewer issues.
In short, the Strangler pattern is an invaluable tool for transforming your monolithic
system into a microservices architecture with confidence. By following this pattern,
you can ensure a smooth and risk-free migration, setting your organization up for
success in the microservices era. Are you ready to embrace the Strangler pattern
and revolutionize your architecture?
Conclusion: Unlock the Full Potential of Your Microservices with These Top
Patterns
Why are these patterns essential? Research shows that developers who implement
these patterns experience improved system performance, scalability, and
maintainability. By leveraging these patterns, you can tackle complex challenges
like distributed transactions, service resilience, and user experience optimization
with confidence.
As a software engineer, staying ahead of the curve is crucial for your professional
growth. These patterns provide you with the essential tools to excel in the
microservices domain, setting you apart from your peers and enabling you to
deliver outstanding results.
Arslan Ahmad
Follow
Published in
Level Up Coding
·
13 min read
·
Apr 3
2.9K
17
System Design Master Template
To excel in system design, one of the most crucial aspects is to develop a deep
understanding of fundamental system design concepts such as Load
Balancing, Caching, Partitioning, Replication, Databases, and Proxies.
Through my own experiences, I’ve identified 16 key concepts that can make a
significant difference in your ability to tackle system design problems. These
concepts range from understanding the intricacies of API gateway and mastering
load-balancing techniques to grasping the importance of CDNs and appreciating the
role of caching in modern distributed systems. By the end of this blog, you’ll have a
comprehensive understanding of these essential ideas and the confidence to apply
them in your next interview.
Keeping this master template in mind, we will discuss the 16 essential system
design concepts. Here is their brief description:
When you enter a domain name into your web browser, the DNS is responsible for
locating the associated IP address and directing your request to the correct server.
The process begins with your computer sending a query to a recursive resolver,
which then searches a series of DNS servers, starting with the root server, followed
by the Top-Level Domain (TLD) server, and finally the authoritative name server.
Once the IP address is found, the recursive resolver returns it to your computer,
allowing your browser to establish a connection with the target server and access
the desired content.
DNS Resolver
2. Load Balancer
2. Least Connections: The load balancer assigns requests to the server with
the fewest active connections, prioritizing less-busy servers.
Load Balancer
3. API Gateway
An API Gateway is a server or service that acts as an intermediary between external
clients and the internal microservices or API-based backend services of an
application. It is a crucial component in modern architectures, especially in
microservices-based systems, where it simplifies the communication process and
provides a single entry point for clients to access various services.
4. Caching: To reduce latency and backend load, the API Gateway can
cache frequently-used responses, serving them directly to clients without
the need to query the backend services.
API Gateway
Check Grokking the System Design Interview for a list of common system design
interview questions and basic concepts.
4. CDN
A Content Delivery Network (CDN) is a distributed network of servers that store
and deliver content, such as images, videos, stylesheets, and scripts, to users from
geographically closer locations. CDNs are designed to improve the performance,
speed, and reliability of content delivery to end-users, regardless of their location
relative to the origin server.
2. If the edge server has the requested content cached, it directly serves the
content to the user. This reduces latency and improves the user
experience, as the content travels a shorter distance.
3. If the content is not cached on the edge server, the CDN retrieves it from
the origin server or another nearby CDN server. Once the content is
fetched, it is cached on the edge server and served to the user.
A reverse proxy is a server that sits in front of one or more web servers and acts as
an intermediary between the web servers and the Internet. When a client makes a
request to a resource on the internet, the request is first sent to the reverse proxy.
The reverse proxy then forwards the request to one of the web servers, which
returns the response to the reverse proxy. The reverse proxy then returns the
response to the client.
6. Caching
The cache is a high-speed storage layer that sits between the application and the
original source of the data, such as a database, a file system, or a remote web
service. When data is requested by the application, it is first checked in the cache. If
the data is found in the cache, it is returned to the application. If the data is not
found in the cache, it is retrieved from its original source, stored in the cache for
future use, and returned to the application. In a distributed system, caching can be
done at multiple places for example, Client, DNS, CDN, Load Balancer, API
Gateway, Server, Database, etc.
What are where to cache
7. Data Partitioning
Data partitioning
8. Database Replication
4. Load Balancing: Replicas can handle read queries, which allows for
better load distribution and reduces the overall strain on the primary
database.
10. Microservices
2. Key-Value: These databases store data as key-value pairs, where the key
acts as a unique identifier, and the value holds the associated data. Key-
value databases are highly efficient for simple read and write operations,
and they can be easily partitioned and scaled horizontally. Examples of
key-value NoSQL databases include Redis and Amazon DynamoDB.
Indexes are usually built on one or more columns of a database table. The most
common type of index is the B-tree index, which organizes data in a hierarchical
tree structure, allowing for fast search, insertion, and deletion operations. There are
other types of indexes, such as bitmap indexes and hash indexes, each with their
specific use cases and advantages.
While indexes can significantly improve query performance, they also have some
trade-offs:
Distributed file systems are storage solutions designed to manage and provide
access to files and directories across multiple servers, nodes, or machines, often
distributed over a network. They enable users and applications to access and
manipulate files as if they were stored on a local file system, even though the actual
files might be physically stored on multiple remote servers. Distributed file systems
are often used in large-scale or distributed computing environments to provide fault
tolerance, high availability, and improved performance.
These are used to send notifications or alerts to users, such as emails, push
notifications, or text messages.
Full-text search enables users to search for specific words or phrases within an app
or website. When a user queries, the app or website returns the most relevant
results. To do this quickly and efficiently, full-text search relies on an inverted
index, which is a data structure that maps words or phrases to the documents in
which they appear. An example of such systems is Elastic Search.
Conclusion
5. Designing Uber
Take a look at Grokking the System Design Interview for a detailed discussion of
such system design interview questions.
Check Grokking System Design Fundamentals for a list of common system
design concepts.
https://www.designgurus.io/blog/system-design-interview-fundamentals
METHODS OF TIME SERIES TIME SERIES Time series is set of data collected
and arranged in accordance of time. According to Croxton and Cowdon,”A Time
series consists of data arranged chronologically.” Such data may be series of
temperature of patients, series showing number of suicides in different months of
year etc. The analysis of time series means separating out different components
which influences values of series. The variations in the time series can be divided
into two parts: long term variations and short term variations.Long term variations
can be divided into two parts: Trend or Secular Trend and Cyclical variations.
Short term variations can be divided into two parts: Seasonal variations and
Irregular Variations. METHODS FOR TIME SERIES ANALYSIS In business
forecasting, it is important to analyze the characteristic movements of variations in
the given time series. The following methods serve as a tool for this analysis: 1.
Methods for Measurement of Secular Trend i. Freehand curve Method (Graphical
Method) 1 ii. Method of selected points iii.Method of semi-averages iv.Method of
moving averages v. Method of Least Squares 2. Methods for Measurement of
Seasonal Variations i. Method of Simple Average ii. Ratio to Trend Method
iii.Ratio to Moving Average Method iv.Method of Link Relatives 3. Methods for
Measurement for Cyclical Variations 4.Methods for Measurement for Irregular
Variations METHODS FOR MEASUREMENT OF SECULAR TREND The
following are the principal methods of measuring trend from given time series: 1.
GRAPHICAL OR FREE HAND CURVE METHOD 2 This is the simple method
of studying trend. In this method the given time series data are plotted on graph
paper by taking time on X-axis and the other variable on Y-axis. The graph
obtained will be irregular as it would include short-run oscillations. We may
observe the up and down movement of the curve and if a smooth freehand curve is
drawn passing approximately to all points of a curve previously drawn, it would
eliminate the short-run oscillations (seasonal, cyclical and irregular variations) and
show the long-period general tendency of the data. This is exactly what is meant by
Trend. However, It is very difficult to draw a freehand smooth curve and different
persons are likely to draw different curves from the same data. The following
points must be kept in mind in drawing a freehand smooth curve: 1. That the curve
is smooth. 2. That the numbers of points above the line or curve are equal to the
points below it. 3. That the sum of vertical deviations of the points above the
smoothed line is equal to the sum of the vertical deviations of the points below the
line. In this way the positive deviations will cancel the negative deviations. These
deviations are the effects of seasonal cyclical and irregular variations and by this
process they are eliminated. 4. The sum of the squares of the vertical deviations
from the trend line curve is minimum. (This is one of the characteristics of the
trend line fitted by the method of lest squares ) 3 The trend values can be read for
various time periods by locating them on the trend line against each time period.
The following example will illustrate the fitting of a freehand curve to set of time
series values: Example: The table below shows the data of sale of nine years:- Year
1 990 1 991 1 992 1 993 1 994 1 995 1 996 1 997 1 998 Sales in (lakh units) 65 95
115 63 120 100 150 135 172 If we draw a graph taking year on x-axis and sales on
yaxis, it will be irregular as shown below. Now drawing a freehand curve passing
approximately through all this points will represent trend line (shown below by
black line). 4 MERITS: 1. It is simple method of estimating trend which requires
no mathematical calculations. 2. It is a flexible method as compared to rigid
mathematical trends and, therefore, a better representative of the trend of the data.
3. This method can be used even if trend is not linear. 4. If the observations are
relatively stable, the trend can easily be approximated by this method. 5. Being a
non mathematical method, it can be applied even by a common man. DEMERITS:
1. It is subjective method. The values of trend, obtained by different statisticians
would be different and hence, not reliable. 5 2. Predictions made on the basis of
this method are of little value. 2. METHOD OF SELECTED POINTS In this
method, two points considered to be the most representative or normal, are joined
by straight line to get secular trend. This, again, is a subjective method since
different persons may have different opinions regarding the representative points.
Further, only linear trend can be determined by this method. 3. METHOD OF
SEMI-AVERAGES Under this method, as the name itself suggests semiaverages
are calculated to find out the trend values. By semi-averages is meant the averages
of the two halves of a series. In this method, thus, the given series is divided into
two equal parts (halves) and the arithmetic mean of the values of each part (half) is
calculated. The computed means are termed as semi-averages. Each semi-average
is paired with the centre of time period of its part. The two pairs are then plotted on
a graph paper and the points are joined by a straight line to get the trend. It should
be noted that if the data is for even number of years, it can be easily divided into
two halves. But if it is for odd number of years, we leave the middle year of the
time series and two halves constitute the periods on each side of the middle year. 6
MERITS: 1. It is simple method of measuring trend. 2. It is an objective method
because anyone applying this to a given data would get identical trend value.
DEMERITS: 1. This method can give only linear trend of the data irrespective of
whether it exists or not. 2. This is only a crude method of measuring trend, since
we do not know whether the effects of other components are completely eliminated
or not. 4. METHOD OF MOVING AVERAGE This method is based on the
principle that the total effect of periodic variations at different points of time in its
cycle gets completely neutralized, i.e. ∑St = 0 in one year and ∑Ct = 0 in the
periods of cyclical variations. In the method of moving average, successive
arithmetic averages are computed from overlapping groups of successive values of
a time series. Each group includes all the observations in a given time interval,
termed as the period of moving average. The next group is obtained by replacing
the oldest value by the next value in the series. The averages of such groups are
known as the moving averages. The moving averages of a group are always shown
at the centre of its period. The process of computing moving averages smoothens
out the fluctuations in the time series data. It 7 can be shown that if the trend is
linear and the oscillatory variations are regular, the moving average with the period
equal to the period of oscillatory variations would get minimized because the
average of a number of observations must lie between the smallest and the largest
observation. It should be noted that the larger is the period of moving average the
more would be the reduction in the effect of random components but the more
information is lost at the two ends of data.i.e. it reduces the curvature of curvi-
linear trends. When the trend is non linear, the moving averages would give biased
rather than the actual trend values. Suppose that the successive observations are
taken at equal intervals of time, say, yearly are Y1, Y2, Y3 , . . . Moving Average
when the period is Odd Now by a three-yearly moving averages, we shall obtain
average of first three consecutive years (beginning with the second year) and place
it against time t=2; then the average of the next three consecutive years (beginning
with the second year) and place it against time t=3, and so on. This is illustrated
below: T ime (t) Observati ons Yt Moving Total Moving Average (3 Years) 1 Y1 2
Y2 Y1 + Y2 + Y3 ⅓ (Y1 + Y2 + Y3) 3 Y3 Y2 + Y3 + ⅓ (Y2 + Y3 + 8 Y4 Y4) 4
Y4 Y3 + Y4 + Y5 ⅓ (Y3 + Y4 + Y5) 5 Y5 It should be noted that for odd period
moving average, it is not possible to get the moving averages for the first and the
last periods. Moving Average when the period is Even For an even order moving
average, two averaging processes are necessary in order to centre the moving
average against periods rather than between periods. For example , for a four –
yearly moving average we shall first obtain the average Y1=1/4(Y1 + Y2 + Y3 +
Y4 ) of the first four years and place it in between t =2 and t=3 then the average Y2
= 1/4( Y2 + Y3 + Y4 + Y5) of the next four years is and place it in between t=3
and t=4 , and finally obtain the average ½(Y1 + Y2) of the two averages and place
it against time t=3. Thus the moving average is brought against time or period
rather than between periods. The same procedure is repeated for further results.
This is tabulated below: T ime Observations Yt Moving Average for Centered
Value 9 (t) 4-period 1 Y1 2 Y2 → ¾(Y1 + Y2 + Y3 + Y4 ) = A1 3 Y3 → ½(A1 +
A2 ) → ¾(Y2 + Y3 + Y4 + Y5) = A2 4 Y4 It should be noted that when the period
of moving average is even, the computed average will correspond to the middle of
the two middle most periods. MERITS: 1. This method is easy to understand and
easy to use because there are no mathematical complexities involved. 2. It is an
objective method in the sense that anybody working on a problem with the method
will get the same trend values. It is in this respect better than the free hand curve
method. 3. It is a flexible method in the sense that if a few more observations are
added, the entire calculations are not changed. This not with the case of semi-
average method. 10 4. When the period of oscillatory movements is equal to the
period of moving average, these movements are completely eliminated. 5. By the
indirect use of this method, it is also possible to isolate seasonal, cyclical and
random components. DEMERITS: 1. It is not possible to calculate trend values for
all the items of the series. Some information is always lost at its ends. 2. This
method can determine accurate values of trend only if the oscillatory and the
random fluctuations are uniform in terms of period and amplitude and the trend is,
at least, approximately linear. However, these conditions are rarely met in practice.
When the trend is not linear, the moving averages will not give correct values of
the trend. 3. The trend values obtained by moving averages may not follow any
mathematical pattern i.e. fails in setting up a functional relationship between the
values of X(time) and Y(values) and thus, cannot be used for forecasting which
perhaps is the main task of any time series analysis. 4. The selection of period of
moving average is a difficult task and a great deal of care is needed to determine it.
5. Like arithmetic mean, the moving averages are too much affected by extreme
values. 11 5. METHOD OF LEAST SQUARES This is one of the most popular
methods of fitting a mathematical trend. The fitted trend is termed as the best in the
sense that the sum of squares of deviations of observations, from it, is minimized.
This method of Least squares may be used either to fit linear trend or a nonlinear
trend (Parabolic and Exponential trend). FITTIG OF LINEAR TREND Given the
data (Yt, t) for n periods, where t denotes time period such as year, month, day, etc.
We have to the values of the two constants, ‘a’ and ‘b’ of the linear trend equation:
Yt = a + bt Where the value of ‘a’ is merely the Y-intercept or the height of the line
above origin. That is, when X=0, Y= a. The other constant ‘b’ represents the slope
of the trend line. When b is positive, the slope is upwards, and when b is negative,
the slope is downward. This line is termed as the line of best fit because it is so
fitted that the total distance of deviations of the given data from the line is
minimum. The total of deviations is calculated by squaring the difference in trend
value and actual value of variable. Thus, the term “Least Squares” is attached to
this method. 12 using least square method, the normal equation for obtaining the
values of a and b are : ∑ Yt = na + b∑t ∑tYt =a∑t + b∑t 2 Let X = t – A, such that
∑X = 0, where A denotes the year of origin. The above equations can also be
written as ∑Y = na + b∑X ∑XY =a∑X + b ∑X2 Since ∑x = 0 i.e. deviation from
actual mean is zero We can write FITTING OF PARABOLIC TREND The
mathematical form of a parabolic trend is given by: Yt =a + bt+ct 2 Here a, b and c
are constants to be determined from the given data. 13 a =∑Y/n b =∑ XY /∑X 2
Using the method of least squares, the normal equations for the simultaneous
solution of a, b and c are: ∑ Y = na + b∑t +c∑t 2 ∑tY =a∑t + b∑t2 + c∑t3 ∑t 2Y
=a∑t2 + b∑t3 + c∑t4 By selecting a suitable year of origin, i.e. define X = t –
origin such that ∑X = 0, the computation work can be considerably simplified.
Also note that if ∑X = 0, then ∑X3 will also be equal to zero. Thus, the above
equations can be rewritten as: ∑ Y = na +c∑X 2 ……….(1) ∑XY = b∑X 2
……….(2) ∑X 2Y = a∑X 2 + c∑X 4……….(3) From equation (2), we get From
equation (1), we get 14 b =∑XY/∑X2 And from equation (3), we get This are the
three equations to find the value of constants a, b and c. FITTING OF
EXPONENTIAL TREND The general form of an exponential trend is: Y = a.b t
Where ‘a’ and ‘b’ are constants to be determined from the observed data. Taking
logarithms of both side, we gave log Y = log a + log b. This is linear equation in
log Y and t can be fitted in a similar way as done in case of linear trend. Let A=log
a and B =log b, then the above equation can be written as: 15 a = ∑ Y - c∑X2 /n c
= n∑X2Y – (∑X2 ) ( ∑ Y) /n∑X4 – (∑X2 ) 2 or c = ∑X2Y - a∑X2 /∑X4 log Y =
A + Bt The normal equations based on the principle of least squares are: ∑log Y =
n A + B ∑t And ∑t log Y = n ∑t + B ∑t 2 By selecting a suitable origin, i.e.
defining X = t – origin such that ∑X = 0, the computation work can be simplified.
The values of A and B are given by: And Thus, the fitted trend equation can be
written as: log Y = A + BX or Y = Antilog [A +BX] = Antilog [log a + X log b] =
Antilog [log a.b x ] = a.b x 16 A =∑log Y / n B =∑X log Y / ∑X2 MERITS: 1.
Given the mathematical form of the trend to be fitted, the least squares method is
an objective method. 2. Unlike the moving average method, it is possible to
compute trend values for all the periods and predict the value for a period lying
outside the observed data. 3. The results of the method of least squares are most
satisfactory because the fitted trend satisfies the two most important properties, i.e.
(1) ∑(Y0 - Yt ) = 0 and (2) ∑(Y0 - Yt )2 is minimum. Here Y0 denotes the
observed values and Yt denotes the calculated trend value. The first property
implies that the position of fitted trend equation is such that the sum of deviations
of observations above and below this equal to zero. The second property implies
that the sums of squares of deviations of observations, about the trend equations,
are minimum. DEMERITS: 1. As compared with the moving average method, it is
cumbersome method. 2. It is not flexible like the moving average method. If some
observations are added, then the entire calculations are to be done once again. 17 3.
It can predict or estimate values only in the immediate future or the past. 4. The
computation of trend values, on the basis of this method, doesn’t take into account
the other components of a time series and hence not reliable. 5. Since the choice of
a particular trend is arbitrary, the method is not, strictly, objective. 6. This method
cannot be used to fit growth curves, the pattern followed by the most of the
economic and business time series. MEASUREMENT OF SEASONAL
VARIATIONS The measurement of seasonal variations is done by isolating them
from other components of a time series. There are four methods commonly used
for the measurement of seasonal variations. These methods are: 1. Method of
Simple Average 2.Ratio to Trend Method 18 3.Ratio to Moving Average Method
4. Method of link Relatives 1. METHOD OF SIMPLE AVERAGE This is the
easiest and the simplest method of studying seasonal variations. This method is
used when the time series variable consists of only the seasonal and random
components. The effect of taking average of data corresponding to the same period
(say first quarter of each year) is to eliminate the effect of random component and
thus, the resulting averages consist of only seasonal component. These averages
are then converted into seasonal indices. It involves the following steps: If figures
are given on a monthly basis: 1. Average the raw data monthly year wise. 2. Find
the sum of all the figures relating to a month. It means add all values of January for
all the years. Repeat the process for all the months. 3. Find the average of monthly
figures i.e., divide the monthly total by the number of years. For example if the
data for 5 years (on monthly basis is available) there will be five figures for
January. These have to be totaled and divided by five to get the average figures for
January. Get such figures for all months. They may be X1, X2, X3........... X12. 4.
Obtain the average of monthly averages by dividing the sum of averages by 12 or
19 X1+X2+X3+...+X1212=X 5. Taking the average of monthly average as 100
find out the percentages of monthly averages. For the average of January (X1) this
percentage would be: average of januaryaverage of monthly averages×100 Or
X1X×100 If instead of the averages the monthly totals are taken into the account
the result would be the same. MERITS AND DEMERITS This is a simplest
method of measuring seasonal variations. However this method is based on the
unrealistic assumption that the trend and cyclical variations are absent from the
data. 2. RATIO TO TREND METHOD This method is used when then cyclical
variations are absent from the data, i.e. the time series variable Y consists of trend,
seasonal and random components. Using symbols, we can write Y = T. S .R
Various steps in the computation of seasonal indices are: 1. Obtain the trend values
for each month or quarter, etc, by the method of least squares. 2. Divide the
original values by the corresponding trend values. This would eliminate trend
values from the data. 20 3. To get figures in percentages, the quotients are
multiplied by 100. Thus, we have three equations: MERITS AND DEMERITS It is
an objective method of measuring seasonal variations. However, it is very
complicated and doesn’t work if cyclical variations are present. 3.RATIO TO
MOVING AVERAGE METHOD The ratio to moving average is the most
commonly used method of measuring seasonal variations. This method assumes
the presence of all the four components of a time series. Various steps in the
computation of seasonal indices are as follows: 1.Compute the moving averages
with period equal to the period of seasonal variations. This would eliminate the
seasonal components and minimize the effect of random component. The resulting
moving averages would consist of trend, cyclical and random components. 21 Y/ T
×100 T. S. R / T ×100 S. R ×100 2. The original values, for each quarter ( or
month) are divided by the respective moving average figures and the ratio is
expressed as a percentage, i.e. SR” = Y / M. A = TCSR / TCR’, where R’ and R”
denote the changed random components. 3. Finally, the random component R” is
eliminated by the method of simple averages. MERITS AND DEMERITS This
method assumes that all the four components of a time series are present and,
therefore, widely used for measuring seasonal variations. However, the seasonal
variations are not completely eliminated if the cycles of these variations are not of
regular nature. Further, some information is always lost at ends of the time series.
4. LINK RELATIVES METHOD This method is based on the assumption that the
trend is linear and cyclical variations are of uniform pattern. The link relatives are
percentages of the current period (quarter or month) as compared with the previous
period. With the computations of the link relatives and their average, the effect of
cyclical and the random components is minimized. Further, the trend gets
eliminated in the process of adjustment of chain relatives. 22 The following steps
are involved in the computation of seasonal indices by this method: 1. Compute the
Link Relative (L.R.) of each period by dividing the figure of that period with the
figure of previous period. For example, Link relative of 3rd quarter=figure of 3rd
quarter / figure of 2nd quarter ×100. 2. Obtain the average of link relatives of a
given quarter (or month) of various years. A.M. or Md can be used for this
purpose. Theoretically, the later is preferable because the former gives undue
importance to extreme items. 3. These averages are converted into chained
relatives by assuming the chained relative of the first quarter (or month) equal to
100. The chained relative (C.R.) for the current period (quarter or month) = C.R. of
the previous period ×L.R. of the current period / 100. 4.Compute the C.R. of the
first quarter (or month) on the basis of the last quarter (or month). This is given by
C.R. of the last quarter (month) × average L.R. of the first quarter (month) / 100 a.
This value, in general, is different from 100 due to long term trend in the data. The
chained relatives, obtained above, are to be adjusted for the effect of this trend. The
adjustment factor 23 d=14new C.R for 1st quater-100for quaterly data d=112new
C.R for 1st month-100for monthly data b. On the assumption that the trend is
linear d, 2d, 3d, etc, is respectively subtracted from the 2nd , 3 rd , 4th , etc quarter
(or month). 5. Express the adjusted chained relatives as a percentage of their
average to obtain seasonal indices. 6. Make sure that the sum of these indices is
400 for quarterly data and 1200 for monthly data. MERITS AND DEMERITS This
method is less complicated than the ratio to moving average and the ratio to trend
methods. However, this method is based upon the assumption of a linear trend
which may not always hold true. MEASUREMENT OF CYCLICAL
VARIATIONS A satisfactory method for the direct measurement of cyclical
variations is not available. The main reason for this is that although these
variations may be recurrent, these are seldom found to be of similar pattern having
same period and amplitude of oscillations. Moreover, in most of the cases these
variations are so intermixed with 24 random variations that it is very difficult, if
not impossible, to separate them. The cyclical variations are often obtained,
indirectly, as a residue after the elimination of other components. Various steps of
the method are given as below: 1.Compute the trend values (T) and the seasonal
indices(S) by appropriate methods. Here S is obtained as a fraction rather than the
percentage. 2. Divide Y-values by the product of trend and seasonal index. This
ratio would consist of cyclical and random component, i.e. C. R = Y / T. S 3. If
there are no random variations in the time series, the cyclical variations are given
by the step (2) above. Otherwise the random variations should be smoothened out
by computing moving averages of C. R. values with appropriate period. Weighted
moving average with suitable weights may also be used, if necessary, for this
purpose. MEASUREMENT OF RANDOM VARIATIONS The random variations
are also known as irregular variations. Because of their nature, it is very difficult to
devise a formula for their direct computation. Like the cyclical variations, this
component can also be obtained as a residue after eliminating the effects of other
components.
A different person can have a different perspective like one can say find the
mean of all observations, one can have like take mean of recent two
observations, one can say like give more weightage to current observation and
less to past, or one can say use interpolation. There are different methods to
forecast the values.
while Forecasting time series values, 3 important terms need to be taken care
of and the main task of time series forecasting is to forecast these three terms.
DataHour: Leveraging VertexAI, Langchain, and Streamlit
RSVP
1) Seasonality
Seasonality is a simple term that means while predicting a time series data there
are some months in a particular domain where the output value is at a peak as
compared to other months. for example if you observe the data of tours and
travels companies of past 3 years then you can see that in November and
December the distribution will be very high due to holiday season and festival
season. So while forecasting time series data we need to capture this
seasonality.
2) Trend
The trend is also one of the important factors which describe that there is
certainly increasing or decreasing trend time series, which actually means the
value of organization or sales over a period of time and seasonality is increasing
or decreasing.
3) Unexpected Events
Unexpected events mean some dynamic changes occur in an organization, or in
the market which cannot be captured. for example a current pandemic we are
suffering from, and if you observe the Sensex or nifty chart there is a huge
decrease in stock price which is an unexpected event that occurs in the
surrounding.
Methods and algorithms are using which we can capture seasonality and trend
But the unexpected event occurs dynamically so capturing this becomes very
difficult.
If the time series is not stationary, we have to make it stationary and then
proceed with modelling. Rolling statistics is help us in making time series
stationary. so basically rolling statistics calculates moving average. To calculate
the moving average we need to define the window size which is basically how
much past values to be considered.
This is one method of making time series stationary, there are other methods
also which we are going to study as Exponential smoothing.
yT = α * XT + α(1−α) * yT−1
hence here we are taking 2 past observations and what was in the previous
cycle, which means we are taking two consecutive sequences, so this equation
will give us the trend factor.
data = df[1:50]
fit1 = SimpleExpSmoothing(data).fit(smoothing_level=0.2,
optimized=False)
fit2 = SimpleExpSmoothing(data).fit(smoothing_level=0.8,
optimized=False)
plt.figure(figsize=(18, 8))
plt.plot(df[1:50], marker='o', color="black")
plt.plot(fit1.fittedvalues, marker="o", color="b")
plt.plot(fit2.fittedvalues, marker="o", color="r")
plt.xticks(rotation="vertical")
plt.show()
Step-4) Holt method for exponential smoothing
Hot’s method is a popular method for exponential smoothing and is also known
as Linear exponential smoothing. It forecast the data with the trend. It works on
three separate equations that work together to generate the final forecast. let
us apply this to our data and experience the changes. In the first fit, we are
assuming that there is a linear trend in data, and in the second fitting, we are
having exponential smoothing.
# Additive Decomposition
add_result = seasonal_decompose(DrugSalesData['Value'],
model='additive',period=1)
# Multiplicative Decomposition
mul_result = seasonal_decompose(DrugSalesData['Value'],
model='multiplicative',period=1)
We imported the seasonal decompose function from the stats model and pass
both the model as multiplicative and additive. Now let us visualize the result of
each model one by one. first plot the results of the Additive time series.
P-value is greater than 5 per cent, which means we cannot build a model on
Non-stationary data so we have to make the time series stationary. Now to
make time-series stationary there are different methods like autoregression
with ACF, PACF, etc which we will cover in the second part of this article.
End Notes
We have seen what is time-series data, what makes time-series analysis a special
and complex task in Machine learning. We also perform practicals on how to
start working with time series data and how to perform various analyses and
drive inferences from it. In the upcoming part, we will discuss various methods
to make time-series stationary and we will also discuss various time series
classical models like ARIMA, SARIMA, etc.
I hope it was easy to follow till the end, I know it’s a little complex to handle
time-series data But after having a look through this article you got some sort
of understanding and confidence that you can handle time-series data. If you
have any queries, please post them in the comment section below.
)https://www.analyticsvidhya.com/blog/2021/07/time-series-forecasting-complete-
tutorial-part-1/ *
• A single feature.
• All features.
• Forecast multiple steps:
• Autoregressive: Make one prediction at a time and feed the output back to the
model.
Setup
import os
import datetime
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False
This tutorial uses a weather time series dataset recorded by the Max Planck Institute for
Biogeochemistry.
zip_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-
datasets/jena_climate_2009_2016.csv.zip',
fname='jena_climate_2009_2016.csv.zip',
extract=True)
csv_path, _ = os.path.splitext(zip_path)
df = pd.read_csv(csv_path)
# Slice [start:stop:step], starting from index 5 take every 6th record.
df = df[5::6]
Let's take a glance at the data. Here are the first few rows:
df.head()
plot_features = df[plot_cols][:480]
plot_features.index = date_time[:480]
_ = plot_features.plot(subplots=True)
Inspect and cleanup
df.describe().transpose()
Wind velocity
One thing that should stand out is the min value of the wind velocity (wv (m/s)) and the
maximum value (max. wv (m/s)) columns. This -9999 is likely erroneous.
There's a separate wind direction column, so the velocity should be greater than zero
(>=0). Replace it with zeros:
wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0
0.0
Feature engineering
Before diving in to build a model, it's important to understand your data and be sure that
you're passing the model appropriately formatted data.
Wind
The last column of the data, wd (deg)—gives the wind direction in units of degrees.
Angles do not make good model inputs: 360° and 0° should be close to each other and
wrap around smoothly. Direction shouldn't matter if the wind is not blowing.
wv = df.pop('wv (m/s)')
max_wv = df.pop('max. wv (m/s)')
# Convert to radians.
wd_rad = df.pop('wd (deg)')*np.pi / 180
Time
Similarly, the Date Time column is very useful, but not in this string form. Start by
converting it to seconds:
timestamp_s = date_time.map(pd.Timestamp.timestamp)
Similar to the wind direction, the time in seconds is not a useful model input.
Being weather data, it has clear daily and yearly periodicity. There are many ways
you could deal with periodicity.
You can get usable signals by using sine and cosine transforms to clear "Time of
day" and "Time of year" signals:
day = 24*60*60
year = (365.2425)*day
plt.plot(np.array(df['Day sin'])[:25])
plt.plot(np.array(df['Day cos'])[:25])
plt.xlabel('Time [h]')
plt.title('Time of day signal')
If you don't have that information, you can determine which frequencies are
important by extracting features with Fast Fourier Transform. To check the
assumptions, here is the tf.signal.rfft of the temperature over time. Note the
obvious peaks at frequencies near 1/year and 1/day:
f_per_year = f_per_dataset/years_per_dataset
plt.step(f_per_year, np.abs(fft))
plt.xscale('log')
plt.ylim(0, 400000)
plt.xlim([0.1, max(plt.xlim())])
plt.xticks([1, 365.2524], labels=['1/Year', '1/day'])
_ = plt.xlabel('Frequency (log scale)')
You'll use a (70%, 20%, 10%) split for the training, validation, and test sets. Note
the data is not being randomly shuffled before splitting. This is for two reasons:
1. It ensures that chopping the data into windows of consecutive samples is still
possible.
2. It ensures that the validation/test results are more realistic, being evaluated
on the data collected after the model was trained.
column_indices = {name: i for i, name in enumerate(df.columns)}
n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]
num_features = df.shape[1]
The mean and standard deviation should only be computed using the training data
so that the models have no access to the values in the validation and test sets.
It's also arguable that the model shouldn't have access to future values in the
training set when training, and that this normalization should be done using
moving averages. That's not the focus of this tutorial, and the validation and test
sets ensure that you get (somewhat) honest metrics. So, in the interest of simplicity
this tutorial uses a simple average.
train_mean = train_df.mean()
train_std = train_df.std()
Now, peek at the distribution of the features. Some features do have long tails, but
there are no obvious errors like the -9999 wind velocity value.
The models in this tutorial will make a set of predictions based on a window of
consecutive samples from the data.
• The width (number of time steps) of the input and label windows.
• The time offset between them.
• Which features are used as inputs, labels, or both.
This tutorial builds a variety of models (including Linear, DNN, CNN and RNN
models), and uses them for both:
This section focuses on implementing the data windowing so that it can be reused
for all of those models.
Depending on the task and type of model you may want to generate a variety of
data windows. Here are some examples:
1. For example, to make a single prediction 24 hours into the future, given 24
hours of history, you might define a window like this:
2. A model that makes a prediction one hour into the future, given six hours of
history, would need a window like this:
The rest of this section defines a WindowGenerator class. This class can:
Start by creating the WindowGenerator class. The __init__ method includes all the
necessary logic for the input and label indices.
It also takes the training, evaluation, and test DataFrames as input. These will be
converted to tf.data.Datasets of windows later.
class WindowGenerator():
def __init__(self, input_width, label_width, shift,
train_df=train_df, val_df=val_df, test_df=test_df,
label_columns=None):
# Store the raw data.
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df
Here is code to create the 2 windows shown in the diagrams at the start of this
section:
Given a list of consecutive inputs, the split_window method will convert them to a
window of inputs and a window of labels.
WindowGenerator.split_window = split_window
Try it out:
# Stack three slices, the length of the total window.
example_window = tf.stack([np.array(train_df[:w2.total_window_size]),
np.array(train_df[100:100+w2.total_window_size]),
np.array(train_df[200:200+w2.total_window_size])])
Typically, data in TensorFlow is packed into arrays where the outermost index is
across examples (the "batch" dimension). The middle indices are the "time" or
"space" (width, height) dimension(s). The innermost indices are the features.
The code above took a batch of three 7-time step windows with 19 features at each
time step. It splits them into a batch of 6-time step 19-feature inputs, and a 1-time
step 1-feature label. The label only has one feature because
the WindowGenerator was initialized with label_columns=['T (degC)']. Initially,
this tutorial will build models that predict single output labels.
3. Plot
Here is a plot method that allows a simple visualization of the split window:
if self.label_columns:
label_col_index = self.label_columns_indices.get(plot_col, None)
else:
label_col_index = plot_col_index
if label_col_index is None:
continue
if n == 0:
plt.legend()
plt.xlabel('Time [h]')
WindowGenerator.plot = plot
This plot aligns inputs, labels, and (later) predictions based on the time that the
item refers to:
w2.plot()
You can plot the other columns, but the example window w2 configuration only
has labels for the T (degC) column.
w2.plot(plot_col='p (mbar)')
4. Create tf.data.Datasets
Finally, this make_dataset method will take a time series DataFrame and convert it
to a tf.data.Dataset of (input_window, label_window) pairs using
the tf.keras.utils.timeseries_dataset_from_array function:
ds = ds.map(self.split_window)
return ds
WindowGenerator.make_dataset = make_dataset
@property
def train(self):
return self.make_dataset(self.train_df)
@property
def val(self):
return self.make_dataset(self.val_df)
@property
def test(self):
return self.make_dataset(self.test_df)
@property
def example(self):
"""Get and cache an example batch of `inputs, labels` for plotting."""
result = getattr(self, '_example', None)
if result is None:
# No example batch was found, so get one from the `.train` dataset
result = next(iter(self.train))
# And cache it for next time
self._example = result
return result
WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example
Now, the WindowGenerator object gives you access to the tf.data.Dataset objects,
so you can easily iterate over the data.
The Dataset.element_spec property tells you the structure, data types, and shapes
of the dataset elements.
The simplest model you can build on this sort of data is one that predicts a single
feature's value—1 time step (one hour) into the future based only on the current
conditions.
So, start by building models to predict the T (degC) value one hour into the future.
Configure a WindowGenerator object to produce these single-step (input,
label) pairs:
single_step_window = WindowGenerator(
input_width=1, label_width=1, shift=1,
label_columns=['T (degC)'])
single_step_window
The window object creates tf.data.Datasets from the training, validation, and test
sets, allowing you to easily iterate over batches of data.
This first task is to predict temperature one hour into the future, given the current
value of all features. The current values include the current temperature.
So, start with a model that just returns the current temperature as the prediction,
predicting "No change". This is a reasonable baseline since temperature changes
slowly. Of course, this baseline will work less well if you make a prediction further
in the future.
class Baseline(tf.keras.Model):
def __init__(self, label_index=None):
super().__init__()
self.label_index = label_index
baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])
val_performance = {}
performance = {}
val_performance['Baseline'] = baseline.evaluate(single_step_window.val)
performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)
That printed some performance metrics, but those don't give you a feeling for how
well the model is doing.
The WindowGenerator has a plot method, but the plots won't be very interesting
with only a single sample.
wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1,
label_columns=['T (degC)'])
wide_window
By plotting the baseline model's predictions, notice that it is simply the labels
shifted right by one hour:
wide_window.plot(baseline)
In the above plots of three examples the single step model is run over the course of
24 hours. This deserves some explanation:
• The blue Inputs line shows the input temperature at each time step. The
model receives all features, this plot only shows the temperature.
• The green Labels dots show the target prediction value. These dots are
shown at the prediction time, not the input time. That is why the range of
labels is shifted 1 step relative to the inputs.
• The orange Predictions crosses are the model's prediction's for each output
time step. If the model were predicting perfectly the predictions would land
directly on the Labels.
Linear model
The simplest trainable model you can apply to this task is to insert linear
transformation between the input and output. In this case the output from a time
step only depends on that step:
A tf.keras.layers.Dense layer with no activation set is a linear model. The layer
only transforms the last axis of the data from (batch, time, inputs) to (batch, time,
units); it is applied independently to every item across the batch and time axes.
linear = tf.keras.Sequential([
tf.keras.layers.Dense(units=1)
])
This tutorial trains many models, so package the training procedure into a function:
MAX_EPOCHS = 20
val_performance['Linear'] = linear.evaluate(single_step_window.val)
performance['Linear'] = linear.evaluate(single_step_window.test, verbose=0)
Epoch 1/20
1534/1534 [==============================] - 5s 3ms/step - loss: 0.2398
- mean_absolute_error: 0.2786 - val_loss: 0.0124 - val_mean_absolute_error:
0.0838
Epoch 2/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0111
- mean_absolute_error: 0.0786 - val_loss: 0.0102 - val_mean_absolute_error:
0.0757
Epoch 3/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0097
- mean_absolute_error: 0.0730 - val_loss: 0.0091 - val_mean_absolute_error:
0.0712
Epoch 4/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0092
- mean_absolute_error: 0.0705 - val_loss: 0.0088 - val_mean_absolute_error:
0.0695
Epoch 5/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091
- mean_absolute_error: 0.0699 - val_loss: 0.0089 - val_mean_absolute_error:
0.0701
Epoch 6/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091
- mean_absolute_error: 0.0698 - val_loss: 0.0088 - val_mean_absolute_error:
0.0696
Epoch 7/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091
- mean_absolute_error: 0.0697 - val_loss: 0.0088 - val_mean_absolute_error:
0.0694
Epoch 8/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0697 - val_loss: 0.0087 - val_mean_absolute_error:
0.0688
Epoch 9/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0697 - val_loss: 0.0087 - val_mean_absolute_error:
0.0696
Epoch 10/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0696 - val_loss: 0.0088 - val_mean_absolute_error:
0.0692
Epoch 11/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0696 - val_loss: 0.0087 - val_mean_absolute_error:
0.0691
Epoch 12/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0696 - val_loss: 0.0088 - val_mean_absolute_error:
0.0699
Epoch 13/20
1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090
- mean_absolute_error: 0.0695 - val_loss: 0.0088 - val_mean_absolute_error:
0.0697
439/439 [==============================] - 1s 2ms/step - loss: 0.0088 -
mean_absolute_error: 0.0697
Like the baseline model, the linear model can be called on batches of wide
windows. Used this way the model makes a set of independent predictions on
consecutive time steps. The time axis acts like another batch axis. There are no
interactions between the predictions at each time step.
print('Input shape:', wide_window.example[0].shape)
print('Output shape:', linear(wide_window.example[0]).shape)
Here is the plot of its example predictions on the wide_window, note how in many
cases the prediction is clearly better than just returning the input temperature, but
in a few cases it's worse:
wide_window.plot(linear)
One advantage to linear models is that they're relatively simple to interpret. You
can pull out the layer's weights and visualize the weight assigned to each input:
plt.bar(x = range(len(train_df.columns)),
height=linear.layers[0].kernel[:,0].numpy())
axis = plt.gca()
axis.set_xticks(range(len(train_df.columns)))
_ = axis.set_xticklabels(train_df.columns, rotation=90)
Sometimes the model doesn't even place the most weight on the input T (degC).
This is one of the risks of random initialization.
Dense
Before applying models that actually operate on multiple time-steps, it's worth
checking the performance of deeper, more powerful, single input step models.
dense = tf.keras.Sequential([
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=1)
])
val_performance['Dense'] = dense.evaluate(single_step_window.val)
performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)
Epoch 1/20
1534/1534 [==============================] - 7s 4ms/step - loss: 0.0177
- mean_absolute_error: 0.0793 - val_loss: 0.0080 - val_mean_absolute_error:
0.0655
Epoch 2/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0079
- mean_absolute_error: 0.0648 - val_loss: 0.0072 - val_mean_absolute_error:
0.0608
Epoch 3/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0076
- mean_absolute_error: 0.0630 - val_loss: 0.0070 - val_mean_absolute_error:
0.0596
Epoch 4/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0073
- mean_absolute_error: 0.0611 - val_loss: 0.0065 - val_mean_absolute_error:
0.0566
Epoch 5/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0070
- mean_absolute_error: 0.0600 - val_loss: 0.0070 - val_mean_absolute_error:
0.0588
Epoch 6/20
1534/1534 [==============================] - 6s 4ms/step - loss: 0.0069
- mean_absolute_error: 0.0589 - val_loss: 0.0075 - val_mean_absolute_error:
0.0636
439/439 [==============================] - 1s 2ms/step - loss: 0.0075 -
mean_absolute_error: 0.0636
Multi-step dense
A single-time-step model has no context for the current values of its inputs. It can't
see how the input features are changing over time. To address this issue the model
needs access to multiple time steps when making predictions:
The baseline, linear and dense models handled each time step independently. Here
the model will take multiple time steps as input to produce a single output.
Create a WindowGenerator that will produce batches of three-hour inputs and one-
hour labels:
Note that the Window's shift parameter is relative to the end of the two windows.
CONV_WIDTH = 3
conv_window = WindowGenerator(
input_width=CONV_WIDTH,
label_width=1,
shift=1,
label_columns=['T (degC)'])
conv_window
Text(0.5, 1.0, 'Given 3 hours of inputs, predict 1 hour into the future.')
multi_step_dense = tf.keras.Sequential([
# Shape: (time, features) => (time*features)
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=1),
# Add back the time dimension.
# Shape: (outputs) => (1, outputs)
tf.keras.layers.Reshape([1, -1]),
])
print('Input shape:', conv_window.example[0].shape)
print('Output shape:', multi_step_dense(conv_window.example[0]).shape)
IPython.display.clear_output()
val_performance['Multi step dense'] =
multi_step_dense.evaluate(conv_window.val)
performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.test,
verbose=0)
The main down-side of this approach is that the resulting model can only be
executed on input windows of exactly this shape.
print('Input shape:', wide_window.example[0].shape)
try:
print('Output shape:', multi_step_dense(wide_window.example[0]).shape)
except Exception as e:
print(f'\n{type(e).__name__}:{e}')
Input 0 of layer "dense_4" is incompatible with the layer: expected axis -1 of input
shape to have value 57, but received input with shape (32, 456)
Run it on an example batch to check that the model produces outputs with the
expected shape:
Train and evaluate it on the conv_window and it should give performance similar
to the multi_step_dense model.
IPython.display.clear_output()
val_performance['Conv'] = conv_model.evaluate(conv_window.val)
performance['Conv'] = conv_model.evaluate(conv_window.test, verbose=0)
The difference between this conv_model and the multi_step_dense model is that
the conv_model can be run on inputs of any length. The convolutional layer is
applied to a sliding window of inputs:
If you run it on wider input, it produces wider output:
print("Wide window")
print('Input shape:', wide_window.example[0].shape)
print('Labels shape:', wide_window.example[1].shape)
print('Output shape:', conv_model(wide_window.example[0]).shape)
Wide window
Input shape: (32, 24, 19)
Labels shape: (32, 24, 1)
Output shape: (32, 22, 1)
Note that the output is shorter than the input. To make training or plotting work,
you need the labels, and prediction to have the same length. So build
a WindowGenerator to produce wide windows with a few extra input time steps so
the label and prediction lengths match:
LABEL_WIDTH = 24
INPUT_WIDTH = LABEL_WIDTH + (CONV_WIDTH - 1)
wide_conv_window = WindowGenerator(
input_width=INPUT_WIDTH,
label_width=LABEL_WIDTH,
shift=1,
label_columns=['T (degC)'])
wide_conv_window
Now, you can plot the model's predictions on a wider window. Note the 3 input
time steps before the first prediction. Every prediction here is based on the 3
preceding time steps:
wide_conv_window.plot(conv_model)
Recurrent neural network
You can learn more in the Text generation with an RNN tutorial and the Recurrent
Neural Networks (RNN) with Keras guide.
In this tutorial, you will use an RNN layer called Long Short-Term Memory
(tf.keras.layers.LSTM).
1. If False, the default, the layer only returns the output of the final time step,
giving the model time to warm up its internal state before making a single
prediction:
1. If True, the layer returns an output for each input. This is useful for:
• Stacking RNN layers.
• Training a model on multiple time steps simultaneously.
lstm_model = tf.keras.models.Sequential([
# Shape [batch, time, features] => [batch, time, lstm_units]
tf.keras.layers.LSTM(32, return_sequences=True),
# Shape => [batch, time, features]
tf.keras.layers.Dense(units=1)
])
Note: This will give a pessimistic view of the model's performance. On the first
time step, the model has no access to previous steps and, therefore, can't do any
better than the simple linear and dense models shown earlier.
IPython.display.clear_output()
val_performance['LSTM'] = lstm_model.evaluate(wide_window.val)
performance['LSTM'] = lstm_model.evaluate(wide_window.test, verbose=0)
Performance
With this dataset typically each of the models does slightly better than the one
before it:
x = np.arange(len(performance))
width = 0.3
metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in val_performance.values()]
test_mae = [v[metric_index] for v in performance.values()]
The models so far all predicted a single output feature, T (degC), for a single time
step.
All of these models can be converted to predict multiple features just by changing
the number of units in the output layer and adjusting the training windows to
include all features in the labels (example_labels):
single_step_window = WindowGenerator(
# `WindowGenerator` returns all features as labels if you
# don't set the `label_columns` argument.
input_width=1, label_width=1, shift=1)
wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1)
Note above that the features axis of the labels now has the same depth as the
inputs, instead of 1.
Baseline
The same baseline model (Baseline) can be used here, but this time repeating all
features instead of selecting a specific label_index:
baseline = Baseline()
baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])
val_performance = {}
performance = {}
val_performance['Baseline'] = baseline.evaluate(wide_window.val)
performance['Baseline'] = baseline.evaluate(wide_window.test, verbose=0)
Dense
dense = tf.keras.Sequential([
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=num_features)
])
IPython.display.clear_output()
val_performance['Dense'] = dense.evaluate(single_step_window.val)
performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)
RNN
%%time
wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1)
lstm_model = tf.keras.models.Sequential([
# Shape [batch, time, features] => [batch, time, lstm_units]
tf.keras.layers.LSTM(32, return_sequences=True),
# Shape => [batch, time, features]
tf.keras.layers.Dense(units=num_features)
])
history = compile_and_fit(lstm_model, wide_window)
IPython.display.clear_output()
val_performance['LSTM'] = lstm_model.evaluate( wide_window.val)
performance['LSTM'] = lstm_model.evaluate( wide_window.test, verbose=0)
print()
The Baseline model from earlier took advantage of the fact that the sequence
doesn't change drastically from time step to time step. Every model trained in this
tutorial so far was randomly initialized, and then had to learn that the output is a a
small change from the previous time step.
While you can get around this issue with careful initialization, it's simpler to build
this into the model structure.
It's common in time series analysis to build models that instead of predicting the
next value, predict how the value will change in the next time step.
Similarly, residual networks—or ResNets—in deep learning refer to architectures
where each layer adds to the model's accumulating result.
That is how you take advantage of the knowledge that the change should be small.
Essentially, this initializes the model to match the Baseline. For this task it helps
models converge faster, with slightly better performance.
This approach can be used in conjunction with any model discussed in this tutorial.
class ResidualWrapper(tf.keras.Model):
def __init__(self, model):
super().__init__()
self.model = model
%%time
residual_lstm = ResidualWrapper(
tf.keras.Sequential([
tf.keras.layers.LSTM(32, return_sequences=True),
tf.keras.layers.Dense(
num_features,
# The predicted deltas should start small.
# Therefore, initialize the output layer with zeros.
kernel_initializer=tf.initializers.zeros())
]))
IPython.display.clear_output()
val_performance['Residual LSTM'] = residual_lstm.evaluate(wide_window.val)
performance['Residual LSTM'] = residual_lstm.evaluate(wide_window.test,
verbose=0)
print()
CPU times: user 1min 54s, sys: 21.4 s, total: 2min 15s
Wall time: 51.3 s
Performance
x = np.arange(len(performance))
width = 0.3
metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in val_performance.values()]
test_mae = [v[metric_index] for v in performance.values()]
Baseline : 0.1638
Dense : 0.1319
LSTM : 0.1217
Residual LSTM : 0.1193
This section looks at how to expand these models to make multiple time step
predictions.
1. Single shot predictions where the entire time series is predicted at once.
2. Autoregressive predictions where the model only makes single step
predictions and its output is fed back as its input.
In this section all the models will predict all the features across all output time
steps.
For the multi-step model, the training data again consists of hourly samples.
However, here, the models will learn to predict 24 hours into the future, given 24
hours of the past.
Here is a Window object that generates these slices from the dataset:
OUT_STEPS = 24
multi_window = WindowGenerator(input_width=24,
label_width=OUT_STEPS,
shift=OUT_STEPS)
multi_window.plot()
multi_window
A simple baseline for this task is to repeat the last input time step for the required
number of output time steps:
class MultiStepLastBaseline(tf.keras.Model):
def call(self, inputs):
return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1])
last_baseline = MultiStepLastBaseline()
last_baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])
multi_val_performance = {}
multi_performance = {}
multi_val_performance['Last'] = last_baseline.evaluate(multi_window.val)
multi_performance['Last'] = last_baseline.evaluate(multi_window.test, verbose=0)
multi_window.plot(last_baseline)
class RepeatBaseline(tf.keras.Model):
def call(self, inputs):
return inputs
repeat_baseline = RepeatBaseline()
repeat_baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])
multi_val_performance['Repeat'] = repeat_baseline.evaluate(multi_window.val)
multi_performance['Repeat'] = repeat_baseline.evaluate(multi_window.test,
verbose=0)
multi_window.plot(repeat_baseline)
Single-shot models
One high-level approach to this problem is to use a "single-shot" model, where the
model makes the entire sequence prediction in a single step.
This can be implemented efficiently as
a tf.keras.layers.Dense with OUT_STEPS*features output units. The model just
needs to reshape that output to the required (OUTPUT_STEPS, features).
Linear
A simple linear model based on the last input time step does better than either
baseline, but is underpowered. The model needs to predict OUTPUT_STEPS time
steps, from a single input time step with a linear projection. It can only capture a
low-dimensional slice of the behavior, likely based mainly on the time of day and
time of year.
multi_linear_model = tf.keras.Sequential([
# Take the last time-step.
# Shape [batch, time, features] => [batch, 1, features]
tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
# Shape => [batch, 1, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])
IPython.display.clear_output()
multi_val_performance['Linear'] =
multi_linear_model.evaluate(multi_window.val)
multi_performance['Linear'] = multi_linear_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(multi_linear_model)
Dense
Adding a tf.keras.layers.Dense between the input and output gives the linear model
more power, but is still only based on a single input time step.
multi_dense_model = tf.keras.Sequential([
# Take the last time step.
# Shape [batch, time, features] => [batch, 1, features]
tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
# Shape => [batch, 1, dense_units]
tf.keras.layers.Dense(512, activation='relu'),
# Shape => [batch, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])
IPython.display.clear_output()
multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val)
multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(multi_dense_model)
IPython.display.clear_output()
multi_val_performance['Conv'] = multi_conv_model.evaluate(multi_window.val)
multi_performance['Conv'] = multi_conv_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(multi_conv_model)
RNN
A recurrent model can learn to use a long history of inputs, if it's relevant to the
predictions the model is making. Here the model will accumulate internal state for
24 hours, before making a single prediction for the next 24 hours.
In this single-shot format, the LSTM only needs to produce an output at the last
time step, so set return_sequences=False in tf.keras.layers.LSTM.
multi_lstm_model = tf.keras.Sequential([
# Shape [batch, time, features] => [batch, lstm_units].
# Adding more `lstm_units` just overfits more quickly.
tf.keras.layers.LSTM(32, return_sequences=False),
# Shape => [batch, out_steps*features].
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features].
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])
IPython.display.clear_output()
multi_val_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.val)
multi_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(multi_lstm_model)
437/437 [==============================] - 1s 3ms/step - loss: 0.2145 -
mean_absolute_error: 0.2844
The above models all predict the entire output sequence in a single step.
In some cases it may be helpful for the model to decompose this prediction into
individual time steps. Then, each model's output can be fed back into itself at each
step and predictions can be made conditioned on the previous one, like in the
classic Generating Sequences With Recurrent Neural Networks.
One clear advantage to this style of model is that it can be set up to produce output
with a varying length.
You could take any of the single-step multi-output models trained in the first half
of this tutorial and run in an autoregressive feedback loop, but here you'll focus on
building a model that's been explicitly trained to do that.
RNN
This tutorial only builds an autoregressive RNN model, but this pattern could be
applied to any model that was designed to output a single time step.
The model will have the same basic form as the single-step LSTM models from
earlier: a tf.keras.layers.LSTM layer followed by a tf.keras.layers.Dense layer that
converts the LSTM layer's outputs to model predictions.
In this case, the model has to manually manage the inputs for each step, so it
uses tf.keras.layers.LSTMCell directly for the lower level, single time step
interface.
class FeedBack(tf.keras.Model):
def __init__(self, units, out_steps):
super().__init__()
self.out_steps = out_steps
self.units = units
self.lstm_cell = tf.keras.layers.LSTMCell(units)
# Also wrap the LSTMCell in an RNN to simplify the `warmup` method.
self.lstm_rnn = tf.keras.layers.RNN(self.lstm_cell, return_state=True)
self.dense = tf.keras.layers.Dense(num_features)
The first method this model needs is a warmup method to initialize its internal state
based on the inputs. Once trained, this state will capture the relevant parts of the
input history. This is equivalent to the single-step LSTM model from earlier:
FeedBack.warmup = warmup
This method returns a single time-step prediction and the internal state of
the LSTM:
TensorShape([32, 19])
With the RNN's state, and an initial prediction you can now continue iterating the
model feeding the predictions at each step back as the input.
The simplest approach for collecting the output predictions is to use a Python list
and a tf.stack after the loop.
Note: Stacking a Python list like this only works with eager-execution,
using Model.compile(..., run_eagerly=True) for training, or with a fixed length
output. For a dynamic output length, you would need to use
a tf.TensorArray instead of a Python list, and tf.range instead of the
Python range.
FeedBack.call = call
IPython.display.clear_output()
multi_val_performance['AR LSTM'] =
feedback_model.evaluate(multi_window.val)
multi_performance['AR LSTM'] = feedback_model.evaluate(multi_window.test,
verbose=0)
multi_window.plot(feedback_model)
Performance
metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in multi_val_performance.values()]
test_mae = [v[metric_index] for v in multi_performance.values()]
Last : 0.5157
Repeat : 0.3774
Linear : 0.2979
Dense : 0.2762
Conv : 0.2765
LSTM : 0.2772
AR LSTM : 0.2969
The gains achieved going from a dense model to convolutional and recurrent
models are only a few percent (if any), and the autoregressive model performed
clearly worse. So these more complex approaches may not be worth while
on this problem, but there was no way to know without trying, and these models
could be helpful for your problem.
.
References
Duke University, Summary of Rules for Identifying ARIMA Models
Global Temperature Time Series Data
Holmes, Scheuerell, & Ward, Applied Time Series Analysis
Hyndman & Athanasopoulos, Forecasting Principles and Practice
Khan, ARIMA Model for Forescating - Example in R
Towards Data Science, The Complete Guide to Time Sereis Analysis and
Forecasting
السًآآرة الذاتًآآآة الداصة
بالدكتور /أحمد فوزى حسن غنًم
بًانآات شدصًآة
الؼنوان البرًدى للؼمل :كسم الرًاضًات – كلًة الؼلوم –جامؼة حلوان-ػًن حلوان-صندوق برًد
-11795القاهرة – مصر
25552468 الكلًة
25552468 الفاكس
من 1992/1/5مؼًد بقسم الرًاضًات -كلًة الؼلوم -جامؼة حلوان. : التدرج الوظًفي
من 1998/3/19مدرس مساػد بقسم الرًاضًات -كلًة الؼلوم -جامؼة حلوان.
من 2003/3/30مدرس بقسم الرًاضًات -كلًة الؼلوم -جامؼة حلوان.
وحتي اآلن.
من 9/2005 – 2/2005أستاذ مساػد بكلًة االداب والؼلوم جامؼة سبها – لًبًا
من 2018/6/4-2007/11/9أستاذ مساػد بكلًة المجتمغ – الدرج -جامؼة
األمًر سطام بن ػبد الؼزًز -السؼودًة
: ػضوًة الجمؼًات
: األبحاث المنشورة
1-Emad M. Abo El- Dahab and Ahmed F. Ghonaim “Convective heat transfer in
an electrically conducting micropolar fluid at a stretching surface with uniform
free stream”; Journal of Applied Mathematics and Computation, 137(2003)
323-326.
3- Emad M. Abo El- Dahab and Ahmed F. Ghonaim “Radiation effect on heat
transfer of a micropolar fluid through a porous medium”, Accepted at
Applied Mathematics and Computation
-2بجامؼة حلوان
مهارات التفكًر
إستددام التكنولوجًا في التدرًس
تصمًم المقرر الجامؼي
الجوانب القانونًة
الجوانب المالًة
اكتصادًات البحث الؼلمي
اإلتجاهات الحدًثة في التدرًس
اإلػتماد والجودة
التدرًس لؤلػداد الكبًرة والتدرًس المصػر