Unit 4
Unit 4
Time series analysis is a powerful statistical method that examines data points collected at regular
intervals to uncover underlying patterns and trends. This technique is highly relevant across various
industries, as it enables informed decision making and accurate forecasting based on historical data. By
understanding the past and predicting the future, time series analysis plays a crucial role in fields such as
finance, health care, energy, supply chain management, weather forecasting, marketing, and beyond
At its core, time series analysis focuses on studying and interpreting a sequence of data points recorded or
collected at consistent time intervals. Unlike cross-sectional data, which captures a snapshot in time, time
series data is fundamentally dynamic, evolving over chronological sequences both short and extremely
long. This type of analysis is pivotal in uncovering underlying structures within the data, such as trends,
cycles, and seasonal variations.
Time series data is generally comprised of different components that characterize the patterns and
behavior of the data over time. By analyzing these components, we can better understand the dynamics of
the time series and create more accurate models. Four main elements make up a time series dataset:
Trends
Seasonality
Cycles
Noise
Trends show the general direction of the data, and whether it is increasing,
decreasing, or remaining stationary over an extended period of time. Trends
indicate the long-term movement in the data and can reveal overall growth or
decline. For example, e-commerce sales may show an upward trend over the
last five years.
Seasonality refers to predictable patterns that recur regularly, like yearly retail
spikes during the holiday season. Seasonal components exhibit fluctuations fixed
in timing, direction, and magnitude. For instance, electricity usage may surge
every summer as people turn on their air conditioners.
Finally, noise encompasses the residual variability in the data that the other
components cannot explain. Noise includes unpredictable, erratic deviations after
accounting for trends, seasonality, and cycles.
Time series analysis is a statistical technique used to analyze data points gathered at consistent intervals
over a time span in order to detect patterns and trends. Understanding the fundamental framework of the
data can assist in predicting future data points and making knowledgeable choices.
TSA is the backbone for prediction and forecasting analysis, specific to time-based problem statements.
Understanding and matching the current situation with patterns derived from the previous stage.
With the help of “Time Series,” we can prepare numerous time-based analyses and results.
Descriptive analysis: Analysis of a given dataset to find out what is there in it.
Seasonality: In which regular or fixed interval shifts within the dataset in a continuous timeline.
Would be bell curve or saw tooth
Cyclical: In which there is no fixed interval, uncertainty in movement and its pattern
Time series has the below-mentioned limitations; we have to take care of those during our data analysis.
Similar to other models, the missing values are not supported by TSA
Let’s discuss the time series’ data types and their influence. While discussing TS data types, there are two
major types – stationary and non-stationary.
Stationary: A dataset should follow the below thumb rules without having Trend, Seasonality, Cyclical,
and Irregularity components of the time series.
The mean value of them should be completely constant in the data during the analysis.
Non- Stationary: If either the mean-variance or covariance is changing with respect to time, the dataset is
called non-stationary.
as a collection of random variables over time, we have a time series {rt}. Linear time series analysis
provides a natural framework to study the dynamic structure of such a series. The theories of linear time
series discussed include stationarity, dynamic dependence, autocorrelation function, modeling, and
forecasting. The econometric models introduced include (a) simple autoregressive (AR) models, (b)
simple moving-average (MA) models, (b) mixed autoregressive moving-average (ARMA) models, (c)
seasonal models, (d) unit-root nonstationarity, (e) regression models with time series errors, and (f)
fractionally differenced models for long-range dependence. For an asset return rt, simple ...
Nonlinear Time Series Models
Time series can have many patterns. These include trends, seasonality, cycles, and irregularity. When
analyzing time series data, it’s crucial to detect these patterns. You must also understand their possible
causes and relationships. You must also know which algorithms can model and forecast each pattern.
A trend behaviour can be linear or nonlinear. A linear trend refers to a consistent upward or downward
movement in the data over a period of time
Nonlinear time series models are indispensable for analyzing and predicting data where the relationship
between variables is not linear. These models adeptly capture intricate patterns and dependencies in time
series data, making them the ideal choice for various real-world phenomena where linear models are
insufficient.
Non-linear time series models are used to analyze and predict data where the relationship between
variables is not linear. These models capture more complex patterns and dependencies in time series data,
making them suitable for various real-world phenomena where linear models fall short.
Non-linearity: Non-linear time series models are used to capture intricate relationships in time
series data that linear models are unable to capture. These models are essential for accurately
representing and predicting behaviors in data where changes are not proportional to the inputs. In
this discussion, we will explore common types of non-linear time series models, provide a
detailed example with code, and visualize the results. The common types of non-linear time series
models include Threshold Autoregressive (TAR) Models, Smooth Transition Autoregressive
(STAR) Models, Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized ARCH
(GARCH) Models, Markov Switching Models, and Neural Network Models.
Stationarity: “Stationarity” refers to the property where the statistical characteristics of a time
series remain constant over time. This means that the mean, variance, and autocorrelation
structure of the time series do not change. Non-linear time series models can be used to account
for non-linear relationships while maintaining stationarity. Examples of stationary non-linear time
series models include Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized
ARCH (GARCH) Models, Threshold Autoregressive (TAR) Models, Smooth Transition
Autoregressive (STAR) Models, Markov Switching Models, and Non-linear Moving Average
(NMA) Models.
Threshold Autoregressive (TAR) models are a type of non-linear time series model. These
models switch between different regimes or behaviors based on the value of an observed variable relative
to certain thresholds. This approach allows the model to capture non-linear relationships by dividing the
data into different regimes and fitting a separate autoregressive model to each regime. The TAR package
in R provides Bayesian modeling of autoregressive threshold time series models. It identifies the number
of regimes, thresholds, and autoregressive orders, as well as estimates remaining parameters.
It consists of two parts: one for observations below the threshold and another for observations above the
threshold.
Where:
\phi_{i,j} are the coefficients for the AR model in the 𝑖-th regime, with 𝑖 = 1,2 denoting the
regimes.
d is the delay parameter, indicating the lag that the threshold depends on.
Estimation:
The threshold \tau can be determined by methods such as grid search, where various potential thresholds
are tested, and the one that minimizes a chosen criterion (e.g., AIC or BIC) is selected.
The delay parameter d and the autoregressive coefficients \phi_{i,j} are typically estimated using standard
regression techniques within each regime.
Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized ARCH (GARCH) models are
essential for modeling the conditional variance of a time series, particularly in financial econometrics.
They are indispensable for capturing the volatility clustering phenomenon observed in many financial
time series, where periods of high volatility are consistently followed by similar periods, and vice versa.
Model Structure:
The ARCH(q) model specifies that the conditional variance of a time series is a function of its past
squared residuals. Mathematically, it can be represented as:
Where:
Estimation:
Estimating the parameters \alpha_i of the ARCH model involves methods such as maximum likelihood
estimation (MLE). Typically, the sum of squared residuals is minimized to find the optimal parameters.
Model Structure:
The GARCH(p, q) model extends the ARCH model by incorporating both autoregressive and moving
average terms for the conditional variance. The GARCH(p, q) model can be represented as:
Where:
Estimation:
Estimating the parameters \alpha_i and \beta_j of the GARCH model also involves methods such as
maximum likelihood estimation (MLE). The process is similar to that of the ARCH model but involves
optimizing the likelihood function with respect to both sets of parameters.
Applications:
ARCH and GARCH models are widely used in financial modeling for:
Smooth Transition Autoregressive (STAR) models represent a type of nonlinear time series model that
facilitates smooth transitions between different regimes. In contrast to Threshold Autoregressive (TAR)
models, which switch abruptly between regimes, STAR models transition smoothly from one regime to
another based on an underlying transition function.
Model Structure
Where:
\phi_{2,i} are the parameters associated with the nonlinear part of the model.
Note: The transition function G(s_{t-d}; \gamma, c) determines how smoothly the model transitions
between regimes.
Estimation:
Applications:
STAR models are useful in various contexts where smooth transitions between different regimes are
expected. Common applications include:
Economic and financial time series, where market conditions change gradually.
Environmental data, where changes can be gradual and influenced by multiple factors.
Any scenario where a smooth transition between states is more realistic than an abrupt switch.
Non-Moving Average (NMA) models are not a standard class of time series models like AR
(Autoregressive), MA (Moving Average), ARMA (Autoregressive Moving Average), or ARIMA
(Autoregressive Integrated Moving Average) models. However, the term “Non-Moving Average” can be
interpreted to refer to time series models that do not include a moving average component. In this sense,
NMA models would encompass purely autoregressive models or other models that do not explicitly
incorporate moving average terms.
The AR model is a classic example of a time series model that does not include a moving average
component.
Model Structure:
An AR(p) model, where p is the order of the autoregressive process, can be written as:
Where:
Estimation:
The parameters of the AR model can be estimated using methods such as:
Ordinary Least Squares (OLS): Minimizing the sum of squared residuals to estimate the
coefficients.
Maximum Likelihood Estimation (MLE): Maximizing the likelihood function for the observed
data.
Neural Network
Neural Networks are computational models that mimic the complex functions of the human brain. The
neural networks consist of interconnected nodes or neurons that process and learn from data, enabling
tasks such as pattern recognition and decision making in machine learning. The article explores more
about neural networks, their working, architecture and more.
Neural networks extract identifying features from data, lacking pre-programmed understanding. Network
components include neurons, connections, weights, biases, propagation functions, and a learning rule.
Neurons receive inputs, governed by thresholds and activation functions. Connections involve weights
and biases regulating information transfer. Learning, adjusting weights and biases, occurs in three stages:
input computation, output generation, and iterative refinement enhancing the network’s proficiency in
diverse tasks.
These include:
2. Then the free parameters of the neural network are changed as a result of this simulation.
3. The neural network then responds in a new way to the environment because of the changes in its
free parameters.
Importance of Neural Networks
The ability of neural networks to identify patterns, solve intricate puzzles, and adjust to changing
surroundings is essential. Their capacity to learn from data has far-reaching effects, ranging from
revolutionizing technology like natural language processing and self-driving automobiles to automating
decision-making processes and increasing efficiency in numerous industries. The development of
artificial intelligence is largely dependent on neural networks, which also drive innovation and influence
the direction of technology.
Neural networks are complex systems that mimic some features of the functioning of the human brain. It
is composed of an input layer, one or more hidden layers, and an output layer made up of layers of
artificial neurons that are coupled. The two stages of the basic process are called backpropagation
and forward propagation.
Forward Propagation
Input Layer: Each feature in the input layer is represented by a node on the network, which
receives input data.
Weights and Connections: The weight of each neuronal connection indicates how strong the
connection is. Throughout training, these weights are changed.
Hidden Layers: Each hidden layer neuron processes inputs by multiplying them by weights,
adding them up, and then passing them through an activation function. By doing this, non-
linearity is introduced, enabling the network to recognize intricate patterns.
Output: The final result is produced by repeating the process until the output layer is reached.
Backpropagation
Loss Calculation: The network’s output is evaluated against the real goal values, and a loss
function is used to compute the difference. For a regression problem, the Mean Squared
Error (MSE) is commonly used as the cost function.
Loss Function:
Gradient Descent: Gradient descent is then used by the network to reduce the loss. To lower the
inaccuracy, weights are changed based on the derivative of the loss with respect to each weight.
Adjusting weights: The weights are adjusted at each connection by applying this iterative
process, or backpropagation, backward across the network.
Training: During training with different data samples, the entire process of forward propagation,
loss calculation, and backpropagation is done iteratively, enabling the network to adapt and learn
patterns from the data.
Actvation Functions: Model non-linearity is introduced by activation functions like the rectified
linear unit (ReLU) or sigmoid. Their decision on whether to “fire” a neuron is based on the whole
weighted input.
Multilayer Perceptron (MLP): MLP is a type of feedforward neural network with three or more
layers, including an input layer, one or more hidden layers, and an output layer. It uses nonlinear
activation functions.
Recurrent Neural Network (RNN): An artificial neural network type intended for sequential
data processing is called a Recurrent Neural Network (RNN). It is appropriate for applications
where contextual dependencies are critical, such as time series prediction and natural language
processing, since it makes use of feedback loops, which enable information to survive within the
network.
Long Short-Term Memory (LSTM): LSTM is a type of RNN that is designed to overcome the
vanishing gradient problem in training RNNs. It uses memory cells and gates to selectively read,
write, and erase information.
Computing the gradient in the backpropagation algorithm helps to minimize the cost function and
it can be implemented by using the mathematical rule called chain rule from calculus to navigate
through complex layers of the neural network.
Working of Backpropagation Algorithm
Forward pass
Backward pass
In forward pass, initially the input is fed into the input layer. Since the inputs are raw data, they
can be used for training our neural network.
The inputs and their corresponding weights are passed to the hidden layer. The hidden layer
performs the computation on the data it receives. If there are two hidden layers in the neural
network, for instance, consider the illustration fig(a), h1 and h2 are the two hidden layers, and the
output of h1 can be used as an input of h2. Before applying it to the activation function, the bias
is added.
To the weighted sum of inputs, the activation function is applied in the hidden layer to each of its
neurons. One such activation function that is commonly used is ReLU can also be used, which is
responsible for returning the input if it is positive otherwise it returns zero. By doing this so, it
introduces the non-linearity to our model, which enables the network to learn the complex
relationships in the data. And finally, the weighted outputs from the last hidden layer are fed into
the output to compute the final prediction, this layer can also use the activation function called the
softmax function which is responsible for converting the weighted outputs into probabilities for
each class.
In the backward pass process shows, the error is transmitted back to the network which helps the
network, to improve its performance by learning and adjusting the internal weights.
To find the error generated through the process of forward pass, we can use one of the most
commonly used methods called mean squared error which calculates the difference between the
predicted output and desired output. The formula for mean squared error is: Mean squared error =
(predicted output – actual output)^2
Once we have done the calculation at the output layer, we then propagate the error backward
through the network, layer by layer.
The key calculation during the backward pass is determining the gradients for each weight and
bias in the network. This gradient is responsible for telling us how much each weight/bias should
be adjusted to minimize the error in the next forward pass. The chain rule is used iteratively to
calculate this gradient efficiently.
In addition to gradient calculation, the activation function also plays a crucial role in
backpropagation, it works by calculating the gradients with the help of the derivative of the
activation function.
Python program for backpropagation
1. Neural Network Initialization: The NeuralNetwork class is initialized with parameters for the
input size, hidden layer size, and output size. It also initializes the weights and biases with
random values.
2. Sigmoid Activation Function: The sigmoid method implements the sigmoid activation function,
which squashes the input to a value between 0 and 1.
3. Sigmoid Derivative: The sigmoid_derivative method calculates the derivative of the sigmoid
function. It computes the gradients of the loss function with respect to weights.
4. Feedforward Pass: The feedforward method calculates the activations of the hidden and output
layers based on the input data and current weights and biases. It uses matrix multiplication to
propagate the inputs through the network.
6. Training the Neural Network: The train method trains the neural network using the specified
number of epochs and learning rate. It iterates through the training data, performs the feedforward
and backward passes, and updates the weights and biases accordingly.
7. XOR Dataset: The XOR dataset (X) is defined, which contains input pairs that represent the
XOR operation, where the output is 1 if exactly one of the inputs is 1, and 0 otherwise.
8. Testing the Trained Model: After training, the neural network is tested on the XOR dataset (X)
to see how well it has learned the XOR function. The predicted outputs are printed to the console,
showing the neural network’s predictions for each input pair.
Python
import numpy as np
class NeuralNetwork:
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
# Initialize weights
return 1 / (1 + np.exp(-x))
return x * (1 - x)
# Input to hidden
self.hidden_output = self.sigmoid(self.hidden_activation)
# Hidden to output
self.predicted_output = self.sigmoid(self.output_activation)
return self.predicted_output
output_error = y - self.predicted_output
output = self.feedforward(X)
self.backward(X, y, learning_rate)
if epoch % 4000 == 0:
output = nn.feedforward(X)
print(output)
Output:
Epoch 0, Loss:0.36270360966344145
[[0.02477654]
[0.95625286]
[0.96418129]
[0.04729297]]
Fuzzy Logic is a form of many-valued logic in which the truth values of variables may be any
real number between 0 and 1, instead of just the traditional values of true or false. It is used to
deal with imprecise or uncertain information and is a mathematical method for representing
vagueness and uncertainty in decision-making.
Fuzzy Logic is based on the idea that in many cases, the concept of true or false is too restrictive,
and that there are many shades of gray in between. It allows for partial truths, where a statement
can be partially true or false, rather than fully true or false.
Fuzzy Logic is used in a wide range of applications, such as control systems, image processing,
natural language processing, medical diagnosis, and artificial intelligence.
The fundamental concept of Fuzzy Logic is the membership function, which defines the degree of
membership of an input value to a certain set or category. The membership function is a mapping
from an input value to a membership degree between 0 and 1, where 0 represents non-
membership and 1 represents full membership.
Fuzzy Logic is implemented using Fuzzy Rules, which are if-then statements that express the
relationship between input variables and output variables in a fuzzy way. The output of a Fuzzy
Logic system is a fuzzy set, which is a set of membership degrees for each possible output value.
In summary, Fuzzy Logic is a mathematical method for representing vagueness and uncertainty in
decision-making, it allows for partial truths, and it is used in a wide range of applications. It is
based on the concept of membership function and the implementation is done using Fuzzy rules.
In the boolean system truth value, 1.0 represents the absolute truth value and 0.0 represents the
absolute false value. But in the fuzzy system, there is no logic for the absolute truth and absolute
false value. But in fuzzy logic, there is an intermediate value too present which is partially true
and partially false.
ARCHITECTURE
RULE BASE: It contains the set of rules and the IF-THEN conditions provided by the experts to
govern the decision-making system, on the basis of linguistic information. Recent developments
in fuzzy theory offer several effective methods for the design and tuning of fuzzy controllers.
Most of these developments reduce the number of fuzzy rules.
FUZZIFICATION: It is used to convert inputs i.e. crisp numbers into fuzzy sets. Crisp inputs are
basically the exact inputs measured by sensors and passed into the control system for processing,
such as temperature, pressure, rpm’s, etc.
INFERENCE ENGINE: It determines the matching degree of the current fuzzy input with respect
to each rule and decides which rules are to be fired according to the input field. Next, the fired
rules are combined to form the control actions.
DEFUZZIFICATION: It is used to convert the fuzzy sets obtained by the inference engine into a
crisp value. There are several defuzzification methods available and the best-suited one is used
with a specific expert system to reduce the error.
Membership function
Definition: A graph that defines how each point in the input space is mapped to membership
value between 0 and 1. Input space is often referred to as the universe of discourse or universal
set (u), which contains all the possible elements of concern in each particular application.
Singleton fuzzifier
Gaussian fuzzifier
It may not be designed to give accurate reasoning but it is designed to give acceptable reasoning.
It can emulate human deductive thinking, that is, the process people use to infer conclusions from
what they know.
Any uncertainties can be easily dealt with the help of fuzzy logic.
This system can work with any type of inputs whether it is imprecise, distorted or noisy input
information.
The construction of Fuzzy Logic Systems is easy and understandable.
Fuzzy logic comes with mathematical concepts of set theory and the reasoning of that is quite
simple.
It provides a very efficient solution to complex problems in all fields of life as it resembles
human reasoning and decision-making.
The algorithms can be described with little data, so little memory is required.
Many researchers proposed different ways to solve a given problem through fuzzy logic which
leads to ambiguity. There is no systematic approach to solve a given problem through fuzzy logic.
Proof of its characteristics is difficult or impossible in most cases because every time we do not
get a mathematical description of our approach.
As fuzzy logic works on precise as well as imprecise data so most of the time accuracy is
compromised.
Application
It is used in the aerospace field for altitude control of spacecraft and satellites.
It has been used in the automotive system for speed control, traffic control.
It is used for decision-making support systems and personal evaluation in the large company
business.
It has application in the chemical industry for controlling the pH, drying, chemical distillation
process.
Fuzzy logic is used in Natural language processing and various intensive applications in Artificial
Intelligence.
Fuzzy logic is extensively used in modern control systems such as expert systems.
Fuzzy Logic is used with Neural Networks as it mimics how a person would make decisions, only
much faster. It is done by Aggregation of data and changing it into more meaningful data by
forming partial truths as Fuzzy sets.
Genetic Algorithms
Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger part of
evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection and
genetics. These are intelligent exploitation of random searches provided with historical data to
direct the search into the region of better performance in solution space. They are commonly
used to generate high-quality solutions for optimization problems and search problems.
Genetic algorithms simulate the process of natural selection which means those species that
can adapt to changes in their environment can survive and reproduce and go to the next
generation. In simple words, they simulate “survival of the fittest” among individuals of
consecutive generations to solve a problem. Each generation consists of a population of
individuals and each individual represents a point in search space and possible solution. Each
individual is represented as a string of character/integer/float/bits. This string is analogous to the
Chromosome.
Genetic algorithms are based on an analogy with the genetic structure and behavior of
chromosomes of the population. Following is the foundation of GAs based on this analogy –
2. Those individuals who are successful (fittest) then mate to create more offspring than others
3. Genes from the “fittest” parent propagate throughout the generation, that is sometimes parents
create offspring which is better than either parent.
Search space
The population of individuals are maintained within search space. Each individual represents a
solution in search space for given problem. Each individual is coded as a finite length vector
(analogous to chromosome) of components. These variable components are analogous to Genes.
Thus a chromosome (individual) is composed of several genes (variable components).
Fitness Score
A Fitness Score is given to each individual which shows the ability of an individual to
“compete”. The individual having optimal fitness score (or near optimal) are sought.
The GAs maintains the population of n individuals (chromosome/solutions) along with their
fitness scores.The individuals having better fitness scores are given more chance to reproduce
than others. The individuals with better fitness scores are selected who mate and produce better
offspring by combining chromosomes of parents. The population size is static so the room has to
be created for new arrivals. So, some individuals die and get replaced by new arrivals eventually
creating new generation when all the mating opportunity of the old population is exhausted. It is
hoped that over successive generations better solutions will arrive while least fit die.
Each new generation has on average more “better genes” than the individual (solution) of
previous generations. Thus each new generations have better “partial solutions” than previous
generations. Once the offspring produced having no significant difference from offspring
produced by previous populations, the population is converged. The algorithm is said to be
converged to a set of solutions for the problem.
Once the initial generation is created, the algorithm evolves the generation using following
operators –
1) Selection Operator: The idea is to give preference to the individuals with good fitness scores
and allow them to pass their genes to successive generations.
2) Crossover Operator: This represents mating between individuals. Two individuals are
selected using selection operator and crossover sites are chosen randomly. Then the genes at these
crossover sites are exchanged thus creating a completely new individual (offspring). For example
–
3) Mutation Operator: The key idea is to insert random genes in offspring to maintain the
diversity in the population to avoid premature convergence. For example –
Mutation testing
Code breaking