0% found this document useful (0 votes)
24 views24 pages

Unit 4

Uploaded by

klncse2025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views24 pages

Unit 4

Uploaded by

klncse2025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Time Series Analysis

Time series analysis is a powerful statistical method that examines data points collected at regular
intervals to uncover underlying patterns and trends. This technique is highly relevant across various
industries, as it enables informed decision making and accurate forecasting based on historical data. By
understanding the past and predicting the future, time series analysis plays a crucial role in fields such as
finance, health care, energy, supply chain management, weather forecasting, marketing, and beyond

Time series analysis is indispensable in data science, statistics, and analytics.

At its core, time series analysis focuses on studying and interpreting a sequence of data points recorded or
collected at consistent time intervals. Unlike cross-sectional data, which captures a snapshot in time, time
series data is fundamentally dynamic, evolving over chronological sequences both short and extremely
long. This type of analysis is pivotal in uncovering underlying structures within the data, such as trends,
cycles, and seasonal variations.

Components of Time Series Data

Time series data is generally comprised of different components that characterize the patterns and
behavior of the data over time. By analyzing these components, we can better understand the dynamics of
the time series and create more accurate models. Four main elements make up a time series dataset:

 Trends

 Seasonality

 Cycles

 Noise

 Trends show the general direction of the data, and whether it is increasing,
decreasing, or remaining stationary over an extended period of time. Trends
indicate the long-term movement in the data and can reveal overall growth or
decline. For example, e-commerce sales may show an upward trend over the
last five years.

 Seasonality refers to predictable patterns that recur regularly, like yearly retail
spikes during the holiday season. Seasonal components exhibit fluctuations fixed
in timing, direction, and magnitude. For instance, electricity usage may surge
every summer as people turn on their air conditioners.

 Cycles demonstrate fluctuations that do not have a fixed period, such as


economic expansions and recessions. These longer-term patterns last longer
than a year and do not have consistent amplitudes or durations. Business cycles
that oscillate between growth and decline are an example.

 Finally, noise encompasses the residual variability in the data that the other
components cannot explain. Noise includes unpredictable, erratic deviations after
accounting for trends, seasonality, and cycles.

Time series analysis is a statistical technique used to analyze data points gathered at consistent intervals
over a time span in order to detect patterns and trends. Understanding the fundamental framework of the
data can assist in predicting future data points and making knowledgeable choices.

 Collecting the data and cleaning it

 Preparing Visualization with respect to time vs key feature

 Observing the stationarity of the series

 Developing charts to understand its nature.

 Model building – AR, MA, ARMA and ARIMA

 Extracting insights from prediction

Significance of Time Series

TSA is the backbone for prediction and forecasting analysis, specific to time-based problem statements.

 Analyzing the historical dataset and its patterns

 Understanding and matching the current situation with patterns derived from the previous stage.

 Understanding the factor or factors influencing certain variable(s) in different periods.

With the help of “Time Series,” we can prepare numerous time-based analyses and results.

 Forecasting: Predicting any value for the future.

 Segmentation: Grouping similar items together.

 Classification: Classifying a set of items into given classes.

 Descriptive analysis: Analysis of a given dataset to find out what is there in it.

 Intervention analysis: Effect of changing a given variable on the outcome.

Components of Time Series Analysis

Let’s look at the various components of Time Series Analysis:


 Trend: In which there is no fixed interval and any divergence within the given dataset is a
continuous timeline. The trend would be Negative or Positive or Null Trend

 Seasonality: In which regular or fixed interval shifts within the dataset in a continuous timeline.
Would be bell curve or saw tooth

 Cyclical: In which there is no fixed interval, uncertainty in movement and its pattern

 Irregularity: Unexpected situations/events/scenarios and spikes in a short time span.

Limitations of Time Series Analysis?

Time series has the below-mentioned limitations; we have to take care of those during our data analysis.

 Similar to other models, the missing values are not supported by TSA

 The data points must be linear in their relationship.

 Data transformations are mandatory, so they are a little expensive.

 Models mostly work on Uni-variate data.

Data Types of Time Series

Let’s discuss the time series’ data types and their influence. While discussing TS data types, there are two
major types – stationary and non-stationary.

Stationary: A dataset should follow the below thumb rules without having Trend, Seasonality, Cyclical,
and Irregularity components of the time series.

 The mean value of them should be completely constant in the data during the analysis.

 The variance should be constant with respect to the time-frame

 Covariance measures the relationship between two variables.

Non- Stationary: If either the mean-variance or covariance is changing with respect to time, the dataset is
called non-stationary.

LINEAR TIME SERIES

as a collection of random variables over time, we have a time series {rt}. Linear time series analysis
provides a natural framework to study the dynamic structure of such a series. The theories of linear time
series discussed include stationarity, dynamic dependence, autocorrelation function, modeling, and
forecasting. The econometric models introduced include (a) simple autoregressive (AR) models, (b)
simple moving-average (MA) models, (b) mixed autoregressive moving-average (ARMA) models, (c)
seasonal models, (d) unit-root nonstationarity, (e) regression models with time series errors, and (f)
fractionally differenced models for long-range dependence. For an asset return rt, simple ...
Nonlinear Time Series Models

Time series can have many patterns. These include trends, seasonality, cycles, and irregularity. When
analyzing time series data, it’s crucial to detect these patterns. You must also understand their possible
causes and relationships. You must also know which algorithms can model and forecast each pattern.
A trend behaviour can be linear or nonlinear. A linear trend refers to a consistent upward or downward
movement in the data over a period of time

Nonlinear time series models are indispensable for analyzing and predicting data where the relationship
between variables is not linear. These models adeptly capture intricate patterns and dependencies in time
series data, making them the ideal choice for various real-world phenomena where linear models are
insufficient.

Non-linear time series models are used to analyze and predict data where the relationship between
variables is not linear. These models capture more complex patterns and dependencies in time series data,
making them suitable for various real-world phenomena where linear models fall short.

Key Concepts of Nonlinear Time Series

 Non-linearity: Non-linear time series models are used to capture intricate relationships in time
series data that linear models are unable to capture. These models are essential for accurately
representing and predicting behaviors in data where changes are not proportional to the inputs. In
this discussion, we will explore common types of non-linear time series models, provide a
detailed example with code, and visualize the results. The common types of non-linear time series
models include Threshold Autoregressive (TAR) Models, Smooth Transition Autoregressive
(STAR) Models, Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized ARCH
(GARCH) Models, Markov Switching Models, and Neural Network Models.

 Deterministic vs. Stochastic Non-linearity: The connection between variables is deterministic


and can be defined by mathematical functions such as polynomials or exponentials. Deterministic
non-linearity enables the capture of complex, non-linear relationships that linear models cannot
adequately describe. Polynomial models are a simple yet powerful way to model such non-linear
relationships. By fitting a polynomial to the data, we can effectively capture and predict non-
linear trends in the time series. Examples of Deterministic Non-linear Models is Threshold
Models, Polynomial Models, Exponential Models, Logistic Models, Smooth Transition Models.

 Stochastic Non-linearity: Stochastic non-linearity in time series models refers to situations in


which the relationship between variables is non-linear and involves randomness. Unlike
deterministic non-linear models, which have predictable outcomes based solely on past values,
stochastic non-linear models incorporate random components. This makes future values
inherently uncertain, even if the past values are known. Examples of Stochastic Non-linear
Models is Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized ARCH
(GARCH) Models, Smooth Transition Autoregressive (STAR) Models, Markov Switching
Models, Non-linear Moving Average (NMA) Models.

 Stationarity: “Stationarity” refers to the property where the statistical characteristics of a time
series remain constant over time. This means that the mean, variance, and autocorrelation
structure of the time series do not change. Non-linear time series models can be used to account
for non-linear relationships while maintaining stationarity. Examples of stationary non-linear time
series models include Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized
ARCH (GARCH) Models, Threshold Autoregressive (TAR) Models, Smooth Transition
Autoregressive (STAR) Models, Markov Switching Models, and Non-linear Moving Average
(NMA) Models.

Types of Non-linear Time Series Models

1. Threshold Autoregressive (TAR) Models:

Threshold Autoregressive (TAR) models are a type of non-linear time series model. These
models switch between different regimes or behaviors based on the value of an observed variable relative
to certain thresholds. This approach allows the model to capture non-linear relationships by dividing the
data into different regimes and fitting a separate autoregressive model to each regime. The TAR package
in R provides Bayesian modeling of autoregressive threshold time series models. It identifies the number
of regimes, thresholds, and autoregressive orders, as well as estimates remaining parameters.

It consists of two parts: one for observations below the threshold and another for observations above the
threshold.

The two-part TAR model is given by the following formula:

 For observations below the threshold:

Y_t = \phi_{1,0} + \phi_{1,1}y_{t-1} + \phi_{1,2}y_{t-2} +… + \phi_{1,p}y_{t-p} + \epsilon_t, if y_{t-


d} ≤ \tau

 For observations above the threshold:

Y_t = \phi_{2,0} + \phi_{2,1}y_{t-1} + \phi_{2,2}y_{t-2} + … + \phi_{2,p}y_{t-p} + \epsilon_t, if


y_{t-d} > \tau

Where:

 y_t is the observed time series at time 𝑡.

 \phi_{i,j} are the coefficients for the AR model in the 𝑖-th regime, with 𝑖 = 1,2 denoting the
regimes.

 \epsilon_t is the error term.

 \tau is the threshold.

 d is the delay parameter, indicating the lag that the threshold depends on.

 p represents the order of the autoregressive process.

Estimation:
The threshold \tau can be determined by methods such as grid search, where various potential thresholds
are tested, and the one that minimizes a chosen criterion (e.g., AIC or BIC) is selected.

The delay parameter d and the autoregressive coefficients \phi_{i,j} are typically estimated using standard
regression techniques within each regime.

2. Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized ARCH (GARCH)


Models:

Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized ARCH (GARCH) models are
essential for modeling the conditional variance of a time series, particularly in financial econometrics.
They are indispensable for capturing the volatility clustering phenomenon observed in many financial
time series, where periods of high volatility are consistently followed by similar periods, and vice versa.

Autoregressive Conditional Heteroskedasticity (ARCH) Model:

Model Structure:

The ARCH(q) model specifies that the conditional variance of a time series is a function of its past
squared residuals. Mathematically, it can be represented as:

\sigma_t^2 = \alpha_0 + \Sigma_{i=1}^{p}\alpha_i\varepsilon_{t-i}^{2}

Where:

 \sigma_t^2 is the conditional variance of the time series at time t.

 \alpha_0 is a constant term.

 \alpha_i are the parameters of the model.

 \varepsilon_{t-i}^{2} are the squared residuals of the time series up to lag q.

Estimation:

Estimating the parameters \alpha_i of the ARCH model involves methods such as maximum likelihood
estimation (MLE). Typically, the sum of squared residuals is minimized to find the optimal parameters.

3. Generalized Autoregressive Conditional Heteroskedasticity (GARCH) Model:

Model Structure:

The GARCH(p, q) model extends the ARCH model by incorporating both autoregressive and moving
average terms for the conditional variance. The GARCH(p, q) model can be represented as:

\sigma_t^2 = \alpha_0 + \Sigma_{i=1}^{p}\alpha_i\varepsilon_{t-i}^{2} + \Sigma_{j=1}^{q}\beta_j\


sigma_{t-j}^{2}

Where:

 \sigma_t^2 is the conditional variance of the time series at time t.


 \alpha_i and \beta_j are the parameters of the model.

 \varepsilon_{t-i}^{2} are the squared residuals of the time series up to lag p.

 \sigma_{t-j}^{2} are the conditional variances up to lag q.

Estimation:

Estimating the parameters \alpha_i and \beta_j of the GARCH model also involves methods such as
maximum likelihood estimation (MLE). The process is similar to that of the ARCH model but involves
optimizing the likelihood function with respect to both sets of parameters.

Applications:

ARCH and GARCH models are widely used in financial modeling for:

 Forecasting volatility in asset returns.

 Risk management and portfolio optimization.

 Option pricing and hedging strategies.

 Evaluating the impact of news and events on financial markets.

4. Smooth Transition Autoregressive (STAR) Models:

Smooth Transition Autoregressive (STAR) models represent a type of nonlinear time series model that
facilitates smooth transitions between different regimes. In contrast to Threshold Autoregressive (TAR)
models, which switch abruptly between regimes, STAR models transition smoothly from one regime to
another based on an underlying transition function.

Model Structure

A basic STAR model can be written as:

y_t = \phi_{1,0} + \Sigma_{i=1}^p\phi_{1,i}y_{t-i} + (\phi_{2,0} + \Sigma_{i=1}^p\phi_{2,i}y_{t-


i})G(s_{t-d}; \gamma, c) + \epsilon_t

Where:

 y_t is the observed time series at time t.

 \phi_{1,i} are the parameters of the linear part of the model.

 \phi_{2,i} are the parameters associated with the nonlinear part of the model.

 G(s_{t-d}; \gamma, c) is the transition function.

 s_{t-d} is the transition variable with delay d.

 \gamma is the smoothness parameter.


 c is the threshold parameter.

 \epsilon_t is the error term.

Note: The transition function G(s_{t-d}; \gamma, c) determines how smoothly the model transitions
between regimes.

Estimation:

1. Identifying the appropriate transition variable s_{t-d}.

2. Estimating the linear and nonlinear parameters (\phi_{1,i} and \phi_{2,i}).

3. Estimating the parameters of the transition function (\gamma and c).

Applications:

STAR models are useful in various contexts where smooth transitions between different regimes are
expected. Common applications include:

 Economic and financial time series, where market conditions change gradually.

 Environmental data, where changes can be gradual and influenced by multiple factors.

 Any scenario where a smooth transition between states is more realistic than an abrupt switch.

4. Non-linear Moving Average (NMA) Models:

Non-Moving Average (NMA) models are not a standard class of time series models like AR
(Autoregressive), MA (Moving Average), ARMA (Autoregressive Moving Average), or ARIMA
(Autoregressive Integrated Moving Average) models. However, the term “Non-Moving Average” can be
interpreted to refer to time series models that do not include a moving average component. In this sense,
NMA models would encompass purely autoregressive models or other models that do not explicitly
incorporate moving average terms.

Purely Autoregressive (AR) Models:

The AR model is a classic example of a time series model that does not include a moving average
component.

Model Structure:

An AR(p) model, where p is the order of the autoregressive process, can be written as:

y_t = \phi_0 + \phi_1y_{t-1} + \phi_2y_{t-2} + … + \phi_py_{t-p} + \epsilon_t

Where:

 y_t is the value of the time series at time t.

 \phi_0 is the intercept term (often assumed to be zero in many formulations).


 \phi_1, \phi_2,…, \phi_p are the parameters of the model.

 \epsilon_t is the error term or white noise.

Estimation:

The parameters of the AR model can be estimated using methods such as:

 Ordinary Least Squares (OLS): Minimizing the sum of squared residuals to estimate the
coefficients.

 Yule-Walker Equations: Using the autocorrelation function to estimate the parameters.

 Maximum Likelihood Estimation (MLE): Maximizing the likelihood function for the observed
data.

Neural Network

Neural Networks are computational models that mimic the complex functions of the human brain. The
neural networks consist of interconnected nodes or neurons that process and learn from data, enabling
tasks such as pattern recognition and decision making in machine learning. The article explores more
about neural networks, their working, architecture and more.

Neural networks extract identifying features from data, lacking pre-programmed understanding. Network
components include neurons, connections, weights, biases, propagation functions, and a learning rule.
Neurons receive inputs, governed by thresholds and activation functions. Connections involve weights
and biases regulating information transfer. Learning, adjusting weights and biases, occurs in three stages:
input computation, output generation, and iterative refinement enhancing the network’s proficiency in
diverse tasks.

These include:

1. The neural network is simulated by a new environment.

2. Then the free parameters of the neural network are changed as a result of this simulation.

3. The neural network then responds in a new way to the environment because of the changes in its
free parameters.
Importance of Neural Networks

The ability of neural networks to identify patterns, solve intricate puzzles, and adjust to changing
surroundings is essential. Their capacity to learn from data has far-reaching effects, ranging from
revolutionizing technology like natural language processing and self-driving automobiles to automating
decision-making processes and increasing efficiency in numerous industries. The development of
artificial intelligence is largely dependent on neural networks, which also drive innovation and influence
the direction of technology.

Working of a Neural Network

Neural networks are complex systems that mimic some features of the functioning of the human brain. It
is composed of an input layer, one or more hidden layers, and an output layer made up of layers of
artificial neurons that are coupled. The two stages of the basic process are called backpropagation
and forward propagation.
Forward Propagation

 Input Layer: Each feature in the input layer is represented by a node on the network, which
receives input data.

 Weights and Connections: The weight of each neuronal connection indicates how strong the
connection is. Throughout training, these weights are changed.

 Hidden Layers: Each hidden layer neuron processes inputs by multiplying them by weights,
adding them up, and then passing them through an activation function. By doing this, non-
linearity is introduced, enabling the network to recognize intricate patterns.

 Output: The final result is produced by repeating the process until the output layer is reached.

Backpropagation

 Loss Calculation: The network’s output is evaluated against the real goal values, and a loss
function is used to compute the difference. For a regression problem, the Mean Squared
Error (MSE) is commonly used as the cost function.

Loss Function:

 Gradient Descent: Gradient descent is then used by the network to reduce the loss. To lower the
inaccuracy, weights are changed based on the derivative of the loss with respect to each weight.

 Adjusting weights: The weights are adjusted at each connection by applying this iterative
process, or backpropagation, backward across the network.

 Training: During training with different data samples, the entire process of forward propagation,
loss calculation, and backpropagation is done iteratively, enabling the network to adapt and learn
patterns from the data.
 Actvation Functions: Model non-linearity is introduced by activation functions like the rectified
linear unit (ReLU) or sigmoid. Their decision on whether to “fire” a neuron is based on the whole
weighted input.

Types of Neural Networks

There are seven types of neural networks that can be used.

 Feedforward Neteworks: A feedforward neural network is a simple artificial neural network


architecture in which data moves from input to output in a single direction. It has input, hidden,
and output layers; feedback loops are absent. Its straightforward architecture makes it appropriate
for a number of applications, such as regression and pattern recognition.

 Multilayer Perceptron (MLP): MLP is a type of feedforward neural network with three or more
layers, including an input layer, one or more hidden layers, and an output layer. It uses nonlinear
activation functions.

 Convolutional Neural Network (CNN): A Convolutional Neural Network (CNN) is a


specialized artificial neural network designed for image processing. It employs convolutional
layers to automatically learn hierarchical features from input images, enabling effective image
recognition and classification. CNNs have revolutionized computer vision and are pivotal in tasks
like object detection and image analysis.

 Recurrent Neural Network (RNN): An artificial neural network type intended for sequential
data processing is called a Recurrent Neural Network (RNN). It is appropriate for applications
where contextual dependencies are critical, such as time series prediction and natural language
processing, since it makes use of feedback loops, which enable information to survive within the
network.

 Long Short-Term Memory (LSTM): LSTM is a type of RNN that is designed to overcome the
vanishing gradient problem in training RNNs. It uses memory cells and gates to selectively read,
write, and erase information.

Back Propagation Neural Network

 backpropagation is an effective algorithm used to train artificial neural networks, especially in


feed-forward neural networks.

 Backpropagation is an iterative algorithm, that helps to minimize the cost function by


determining which weights and biases should be adjusted. During every epoch, the model learns
by adapting the weights and biases to minimize the loss by moving down toward the gradient of
the error. Thus, it involves the two most popular optimization algorithms, such as gradient
descent or stochastic gradient descent.

 Computing the gradient in the backpropagation algorithm helps to minimize the cost function and
it can be implemented by using the mathematical rule called chain rule from calculus to navigate
through complex layers of the neural network.
Working of Backpropagation Algorithm

The Backpropagation algorithm works by two different passes, they are:

 Forward pass

 Backward pass

How does Forward pass work?

 In forward pass, initially the input is fed into the input layer. Since the inputs are raw data, they
can be used for training our neural network.

 The inputs and their corresponding weights are passed to the hidden layer. The hidden layer
performs the computation on the data it receives. If there are two hidden layers in the neural
network, for instance, consider the illustration fig(a), h1 and h2 are the two hidden layers, and the
output of h1 can be used as an input of h2. Before applying it to the activation function, the bias
is added.

 To the weighted sum of inputs, the activation function is applied in the hidden layer to each of its
neurons. One such activation function that is commonly used is ReLU can also be used, which is
responsible for returning the input if it is positive otherwise it returns zero. By doing this so, it
introduces the non-linearity to our model, which enables the network to learn the complex
relationships in the data. And finally, the weighted outputs from the last hidden layer are fed into
the output to compute the final prediction, this layer can also use the activation function called the
softmax function which is responsible for converting the weighted outputs into probabilities for
each class.

The forward pass using weights and biases

How does backward pass work?

 In the backward pass process shows, the error is transmitted back to the network which helps the
network, to improve its performance by learning and adjusting the internal weights.

 To find the error generated through the process of forward pass, we can use one of the most
commonly used methods called mean squared error which calculates the difference between the
predicted output and desired output. The formula for mean squared error is: Mean squared error =
(predicted output – actual output)^2

 Once we have done the calculation at the output layer, we then propagate the error backward
through the network, layer by layer.

 The key calculation during the backward pass is determining the gradients for each weight and
bias in the network. This gradient is responsible for telling us how much each weight/bias should
be adjusted to minimize the error in the next forward pass. The chain rule is used iteratively to
calculate this gradient efficiently.

 In addition to gradient calculation, the activation function also plays a crucial role in
backpropagation, it works by calculating the gradients with the help of the derivative of the
activation function.
Python program for backpropagation

Here’s a simple implementation of feedforward neural network with backpropagation in Python:

1. Neural Network Initialization: The NeuralNetwork class is initialized with parameters for the
input size, hidden layer size, and output size. It also initializes the weights and biases with
random values.

2. Sigmoid Activation Function: The sigmoid method implements the sigmoid activation function,
which squashes the input to a value between 0 and 1.

3. Sigmoid Derivative: The sigmoid_derivative method calculates the derivative of the sigmoid
function. It computes the gradients of the loss function with respect to weights.

4. Feedforward Pass: The feedforward method calculates the activations of the hidden and output
layers based on the input data and current weights and biases. It uses matrix multiplication to
propagate the inputs through the network.

5. Backpropagation: The backward method performs the backpropagation algorithm. It calculates


the error at the output layer and propagates it back through the network to update the weights and
biases using gradient descent.

6. Training the Neural Network: The train method trains the neural network using the specified
number of epochs and learning rate. It iterates through the training data, performs the feedforward
and backward passes, and updates the weights and biases accordingly.

7. XOR Dataset: The XOR dataset (X) is defined, which contains input pairs that represent the
XOR operation, where the output is 1 if exactly one of the inputs is 1, and 0 otherwise.

8. Testing the Trained Model: After training, the neural network is tested on the XOR dataset (X)
to see how well it has learned the XOR function. The predicted outputs are printed to the console,
showing the neural network’s predictions for each input pair.

Python

import numpy as np

class NeuralNetwork:

def __init__(self, input_size, hidden_size, output_size):

self.input_size = input_size

self.hidden_size = hidden_size

self.output_size = output_size
# Initialize weights

self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size)

self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size)

# Initialize the biases

self.bias_hidden = np.zeros((1, self.hidden_size))

self.bias_output = np.zeros((1, self.output_size))

def sigmoid(self, x):

return 1 / (1 + np.exp(-x))

def sigmoid_derivative(self, x):

return x * (1 - x)

def feedforward(self, X):

# Input to hidden

self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden

self.hidden_output = self.sigmoid(self.hidden_activation)

# Hidden to output

self.output_activation = np.dot(self.hidden_output, self.weights_hidden_output) +


self.bias_output

self.predicted_output = self.sigmoid(self.output_activation)

return self.predicted_output

def backward(self, X, y, learning_rate):


# Compute the output layer error

output_error = y - self.predicted_output

output_delta = output_error * self.sigmoid_derivative(self.predicted_output)

# Compute the hidden layer error

hidden_error = np.dot(output_delta, self.weights_hidden_output.T)

hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)

# Update weights and biases

self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) * learning_rate

self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate

self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate

self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

def train(self, X, y, epochs, learning_rate):

for epoch in range(epochs):

output = self.feedforward(X)

self.backward(X, y, learning_rate)

if epoch % 4000 == 0:

loss = np.mean(np.square(y - output))

print(f"Epoch {epoch}, Loss:{loss}")

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

y = np.array([[0], [1], [1], [0]])

nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)


nn.train(X, y, epochs=10000, learning_rate=0.1)

# Test the trained model

output = nn.feedforward(X)

print("Predictions after training:")

print(output)

Output:

Epoch 0, Loss:0.36270360966344145

Epoch 4000, Loss:0.005546947165311874

Epoch 8000, Loss:0.00202378766386817

Predictions after training:

[[0.02477654]

[0.95625286]

[0.96418129]

[0.04729297]]

Basics of Fuzzy Sets and Fuzzy Logic

Fuzzy Logic is a form of many-valued logic in which the truth values of variables may be any
real number between 0 and 1, instead of just the traditional values of true or false. It is used to
deal with imprecise or uncertain information and is a mathematical method for representing
vagueness and uncertainty in decision-making.

Fuzzy Logic is based on the idea that in many cases, the concept of true or false is too restrictive,
and that there are many shades of gray in between. It allows for partial truths, where a statement
can be partially true or false, rather than fully true or false.

Fuzzy Logic is used in a wide range of applications, such as control systems, image processing,
natural language processing, medical diagnosis, and artificial intelligence.

The fundamental concept of Fuzzy Logic is the membership function, which defines the degree of
membership of an input value to a certain set or category. The membership function is a mapping
from an input value to a membership degree between 0 and 1, where 0 represents non-
membership and 1 represents full membership.
Fuzzy Logic is implemented using Fuzzy Rules, which are if-then statements that express the
relationship between input variables and output variables in a fuzzy way. The output of a Fuzzy
Logic system is a fuzzy set, which is a set of membership degrees for each possible output value.

In summary, Fuzzy Logic is a mathematical method for representing vagueness and uncertainty in
decision-making, it allows for partial truths, and it is used in a wide range of applications. It is
based on the concept of membership function and the implementation is done using Fuzzy rules.

In the boolean system truth value, 1.0 represents the absolute truth value and 0.0 represents the
absolute false value. But in the fuzzy system, there is no logic for the absolute truth and absolute
false value. But in fuzzy logic, there is an intermediate value too present which is partially true
and partially false.

ARCHITECTURE

Its Architecture contains four parts :

 RULE BASE: It contains the set of rules and the IF-THEN conditions provided by the experts to
govern the decision-making system, on the basis of linguistic information. Recent developments
in fuzzy theory offer several effective methods for the design and tuning of fuzzy controllers.
Most of these developments reduce the number of fuzzy rules.

 FUZZIFICATION: It is used to convert inputs i.e. crisp numbers into fuzzy sets. Crisp inputs are
basically the exact inputs measured by sensors and passed into the control system for processing,
such as temperature, pressure, rpm’s, etc.

 INFERENCE ENGINE: It determines the matching degree of the current fuzzy input with respect
to each rule and decides which rules are to be fired according to the input field. Next, the fired
rules are combined to form the control actions.
 DEFUZZIFICATION: It is used to convert the fuzzy sets obtained by the inference engine into a
crisp value. There are several defuzzification methods available and the best-suited one is used
with a specific expert system to reduce the error.

Membership function

Definition: A graph that defines how each point in the input space is mapped to membership
value between 0 and 1. Input space is often referred to as the universe of discourse or universal
set (u), which contains all the possible elements of concern in each particular application.

There are largely three types of fuzzifiers:

 Singleton fuzzifier

 Gaussian fuzzifier

 Trapezoidal or triangular fuzzifier

What is Fuzzy Control?

 It is a technique to embody human-like thinkings into a control system.

 It may not be designed to give accurate reasoning but it is designed to give acceptable reasoning.

 It can emulate human deductive thinking, that is, the process people use to infer conclusions from
what they know.

 Any uncertainties can be easily dealt with the help of fuzzy logic.

Advantages of Fuzzy Logic System

 This system can work with any type of inputs whether it is imprecise, distorted or noisy input
information.
 The construction of Fuzzy Logic Systems is easy and understandable.

 Fuzzy logic comes with mathematical concepts of set theory and the reasoning of that is quite
simple.

 It provides a very efficient solution to complex problems in all fields of life as it resembles
human reasoning and decision-making.

 The algorithms can be described with little data, so little memory is required.

Disadvantages of Fuzzy Logic Systems

 Many researchers proposed different ways to solve a given problem through fuzzy logic which
leads to ambiguity. There is no systematic approach to solve a given problem through fuzzy logic.

 Proof of its characteristics is difficult or impossible in most cases because every time we do not
get a mathematical description of our approach.

 As fuzzy logic works on precise as well as imprecise data so most of the time accuracy is
compromised.

Application

 It is used in the aerospace field for altitude control of spacecraft and satellites.

 It has been used in the automotive system for speed control, traffic control.

 It is used for decision-making support systems and personal evaluation in the large company
business.

 It has application in the chemical industry for controlling the pH, drying, chemical distillation
process.

 Fuzzy logic is used in Natural language processing and various intensive applications in Artificial
Intelligence.

 Fuzzy logic is extensively used in modern control systems such as expert systems.

 Fuzzy Logic is used with Neural Networks as it mimics how a person would make decisions, only
much faster. It is done by Aggregation of data and changing it into more meaningful data by
forming partial truths as Fuzzy sets.

Genetic Algorithms

Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger part of
evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection and
genetics. These are intelligent exploitation of random searches provided with historical data to
direct the search into the region of better performance in solution space. They are commonly
used to generate high-quality solutions for optimization problems and search problems.

Genetic algorithms simulate the process of natural selection which means those species that
can adapt to changes in their environment can survive and reproduce and go to the next
generation. In simple words, they simulate “survival of the fittest” among individuals of
consecutive generations to solve a problem. Each generation consists of a population of
individuals and each individual represents a point in search space and possible solution. Each
individual is represented as a string of character/integer/float/bits. This string is analogous to the
Chromosome.

Foundation of Genetic Algorithms

Genetic algorithms are based on an analogy with the genetic structure and behavior of
chromosomes of the population. Following is the foundation of GAs based on this analogy –

1. Individuals in the population compete for resources and mate

2. Those individuals who are successful (fittest) then mate to create more offspring than others

3. Genes from the “fittest” parent propagate throughout the generation, that is sometimes parents
create offspring which is better than either parent.

4. Thus each successive generation is more suited for their environment.

Search space

The population of individuals are maintained within search space. Each individual represents a
solution in search space for given problem. Each individual is coded as a finite length vector
(analogous to chromosome) of components. These variable components are analogous to Genes.
Thus a chromosome (individual) is composed of several genes (variable components).

Fitness Score

A Fitness Score is given to each individual which shows the ability of an individual to
“compete”. The individual having optimal fitness score (or near optimal) are sought.

The GAs maintains the population of n individuals (chromosome/solutions) along with their
fitness scores.The individuals having better fitness scores are given more chance to reproduce
than others. The individuals with better fitness scores are selected who mate and produce better
offspring by combining chromosomes of parents. The population size is static so the room has to
be created for new arrivals. So, some individuals die and get replaced by new arrivals eventually
creating new generation when all the mating opportunity of the old population is exhausted. It is
hoped that over successive generations better solutions will arrive while least fit die.

Each new generation has on average more “better genes” than the individual (solution) of
previous generations. Thus each new generations have better “partial solutions” than previous
generations. Once the offspring produced having no significant difference from offspring
produced by previous populations, the population is converged. The algorithm is said to be
converged to a set of solutions for the problem.

Operators of Genetic Algorithms

Once the initial generation is created, the algorithm evolves the generation using following
operators –
1) Selection Operator: The idea is to give preference to the individuals with good fitness scores
and allow them to pass their genes to successive generations.
2) Crossover Operator: This represents mating between individuals. Two individuals are
selected using selection operator and crossover sites are chosen randomly. Then the genes at these
crossover sites are exchanged thus creating a completely new individual (offspring). For example

3) Mutation Operator: The key idea is to insert random genes in offspring to maintain the
diversity in the population to avoid premature convergence. For example –

The whole algorithm can be summarized as –

1) Randomly initialize populations p

2) Determine fitness of population

3) Until convergence repeat:


a) Select parents from population

b) Crossover and generate new population

c) Perform mutation on new population

d) Calculate fitness for new population

Application of Genetic Algorithms

Genetic algorithms have many applications, some of them are –

 Recurrent Neural Network

 Mutation testing

 Code breaking

 Filtering and signal processing

 Learning fuzzy rule base etc

You might also like