0% found this document useful (0 votes)
43 views19 pages

Unit 1

Uploaded by

Mohamed riyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views19 pages

Unit 1

Uploaded by

Mohamed riyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Unit I

Artificial neural network


Artificial Neural Network (ANN) refers to a computational model inspired by the way biological
neural networks function in the human brain. ANNs are a fundamental component of machine
learning and artificial intelligence. They consist of interconnected nodes, commonly referred to as
neurons or artificial neurons, organized into layers. ANNs are particularly well-suited for tasks such
as pattern recognition, classification, regression, and decision-making.
Our brain uses the extremely large interconnected network of neurons for information processing and
to model the world around us. Simply put, a neuron collects inputs from other neurons
using dendrites. The neuron sums all the inputs and if the resulting value is greater than a threshold, it
fires. The fired signal is then sent to other connected neurons through the axon.

The architecture of an artificial neural network:


Input Layer: As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer: The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the hidden layer, which
finally results in output that is conveyed using this layer. The artificial neural network takes input and
computes the weighted sum of the inputs and includes a bias. This computation is represented in the
form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that can be applied upon the sort of
task we are performing.

Perceptron:
THE PERCEPTRON

A perceptron is the simplest form of an artificial neural network (ANN), specifically a single-layer
neural network. It was introduced by Frank Rosenblatt in 1957. The perceptron is a binary classifier
that takes multiple binary inputs and produces a single binary output. Its primary function is to learn a
linear decision boundary that separates two classes.
Model of an artificial neuron
As you can see, the network of nodes sends signals in one direction. This is called a feed-forward
network.
The figure depicts a neuron connected with n other neurons and thus receives n inputs (x1, x2, ….. xn).
This configuration is called a Perceptron.
Let’s understand this better with an example. Say you bike to work. You have two factors to make
your decision to go to work: the weather must not be bad, and it must be a weekday. The weather’s
not that big a deal, but working on weekends is a big no-no. The inputs have to be binary, so let’s
propose the conditions as yes or no questions. Weather is fine? 1 for yes, 0 for no. Is it a weekday? 1
yes, 0 no.
Remember, I cannot tell the neural network these conditions; it has to learn them for itself. How will
it know which information will be most important in making its decision? It does with something
called weights. Remember when I said that weather’s not a big deal, but the weekend is? Weights are
just a numerical representation of these preferences. A higher weight means the neural network
considers that input more important compared to other inputs.

For our example, let’s purposely set suitable weights of 2 for weather and 6 for weekday. Now how
do we calculate the output? We simply multiply the input with its respective weight, and sum up all
the values we get for all the inputs. For example, if it’s a nice, sunny (1) weekday (1), we would do
the following calculation:

This calculation is known as a linear combination. Now what does an 8 mean? We first need to
define the threshold value. The neural network’s output, 0 or 1 (stay home or go to work), is
determined if the value of the linear combination is greater than the threshold value. Let’s say the
threshold value is 5, which means that if the calculation gives you a number less than 5, you can stay
at home, but if it’s equal to or more than 5, then you have to go to work.
The perceptron is adding all the inputs and separating them into 2 categories, those that cause it to fire
and those that don’t. That is, it is drawing the line:
w1x1 + w2x2 = t,
where t is the threshold.

To make things a little simpler for training later, let’s make a small readjustment to the above
formula. Let’s move the threshold to the other side of the inequality, and replace it with what’s
known as the neuron’s bias. Now we can rewrite the equation as:
The perceptron learning rule is derived from the gradient descent optimization algorithm. It aims to
minimize the error by adjusting the weights in the direction that reduces the error.
It's important to note that the perceptron is effective only for linearly separable data. For non-linearly
separable data or more complex problems, multilayer perceptrons (MLPs) with non-linear activation
functions are often used.
TRAINING IN PERCEPTRONS
Training a perceptron involves adjusting its weights based on the error between the predicted output
and the target output. Here are the steps involved in training a perceptron:

Perceptron network is capable of performing pattern classification into two or more categories. The
perceptron is trained using the perceptron learning rule. We will first consider classification into two
categories and then the general multiclass classification later. For classification 7 into only two
categories, all we need is a single output neuron. Here we will use bipolar neurons. The simplest
architecture that could do the job consists of a layer of N input neurons, an output layer with a single
output neuron, and no hidden layers. This is the same architecture as we saw before for Hebb
learning. However, we will use a different transfer function here for the output neurons as given
below in eq (7). Figure 7 represents a single layer perceptron network.
Equation 7 gives the bipolar activation function which is the most common function used in
the perceptron networks. Figure 7 represents a single layer perceptron network. The inputs arising
from the problem space are collected by the sensors and they are fed to the association
units.Association units are the units which are responsible to associate the inputs based on their
similarities. This unit groups the similar inputs hence the name association unit. A single input from
each group is given to the summing unit.Weights are randomnly fixed intially and assigned to this
inputs. The net value is calculate by using the expression
x = Σ wiai – θ ___________________ eq(8)
This value is given to the activation function unit to get the final output response. The actual
output is compared with the Target or desired .If they are same then we can stop training else the
weights haqs to be updated .It means there is error .Error is given as δ = b-s , where b is the desired
/ Target output and S is the actual outcome of the machinehere the weights are updated based on the
perceptron Learning law as given in equation 9.
Weight change is given as Δw= η δ ai. So new weight is given as
Wi (new) = Wi (old) + Change in weight vector (Δw) _________eq(9)

Perceptron Algorithm
Step 1: Initialize weights and bias.For simplicity, set weights and bias to zero.Set learning rate
in the range of zero to one.
• Step 2: While stopping condition is false do steps 2-6
• Step 3: For each training pair s:t do steps 3-5
• Step 4: Set activations of input units xi = ai
• Step 5: Calculate the summing part value Net = Σ aiwi-θ
• Step 6: Compute the response of output unit based on the activation functions
• Step 7: Update weights and bias if an error occurred for this pattern(if yis not equal to t)
Weight (new) = wi(old) + atxi , & bias (new) = b(old) + at
Else wi(new) = wi(old) & b(new) = b(old)
• Step 8: Test Stopping Condition

Limitations of single layer perceptrons:


• Uses only Binary Activation function
• Can be used only for Linear Networks
• Since uses Supervised Learning ,Optimal Solution is provided
• Training Time is More
• Cannot solve Linear In-separable Problem

Multi-Layer Perceptron Model:


Figure 8 is the general representation of Multi layer Perceptron network. Inbetween the input
and output Layer there will be some more layers also known as Hidden layers.
Cost Function
In Artificial Neural Networks (ANNs), the cost function, also known as the loss function or objective
function, is a crucial component used to measure the difference between the predicted output and the
actual target values. The goal during the training of a neural network is to minimize this cost function.
The cost function quantifies how well the model is performing, and the optimization algorithm adjusts
the model's parameters to minimize this cost.
The choice of a suitable cost function depends on the type of problem you are trying to solve. Here
are some common cost functions for different types of tasks:

• Introduction
• Optimization
• Gradient Descent
• Types of Gradient Descent
• Batch Gradient Descent
• Stochastic Gradient Descent
The objective of optimization is to deal with real life problems.
• It means getting the optimal output for your problem.
• In machine learning, optimization is slightly different.
• Generally, while optimizing, we know exactly how our data looks like and what areas we
want to improve.
• But in machine learning we have no clue how our “new data” looks like, let alone try to
optimize on it.
• Therefore, in machine learning, we perform optimization on the training data and check its
performance on a new validation data.

Gradient Descent

Gradient Descent (GD) is a widely used optimization algorithm in machine learning and deep
learning that minimizes the cost function of a neural network model during training.
It works by iteratively adjusting the weights or parameters of the model in the direction of the
negative gradient of the cost function until the minimum of the cost function is reached.
It is widely employed in training neural networks and other iterative optimization problems.
The goal of Gradient Descent is to find the optimal parameters (weights and biases) that minimize
the cost function.
Here's a high-level overview of the Gradient Descent process:
1. Initialize Parameters:
Start with initial values for the parameters (weights and biases) of the model. These values can be
set randomly or through some other initialization technique.
2. Compute the Cost Function:
Use the current parameter values to compute the cost or loss function, which measures how well
the model performs on the training data.
3. Compute Gradients:
Calculate the gradient of the cost function with respect to each parameter. This is done using
techniques like backpropagation in neural networks.
4. Update Parameters:
1. Adjust the parameters in the opposite direction of the gradients to reduce the cost.
The update rule is typically defined as:
new_parameter=old_parameter−learning_rate×gradient
The learning rate (α) is a hyperparameter that controls the size of the steps taken during
optimization.
5. Repeat:
Repeat steps 2-4 until a stopping criterion is met, such as reaching a predefined number of
iterations or achieving a satisfactory level of performance.
Types of Gradient Descent
• Batch Gradient Descent:
• Uses the entire training dataset to compute the gradient of the cost function. It can be
computationally expensive for large datasets.
• Stochastic Gradient Descent (SGD):
• Updates the parameters based on the gradient of the cost function for a single
randomly selected training example. This is computationally less expensive but can
introduce more variability in the updates.
• Mini-Batch Gradient Descent:
• Strikes a balance between Batch and Stochastic Gradient Descent by updating
parameters based on a small random subset (mini-batch) of the training data.
• Stochastic Gradient Descent (SGD) is an optimization algorithm used in machine learning
and artificial intelligence to find the best parameters for a given model.
• It is a popular method for training large datasets, as it allows for efficient computation and
parallelization.
• In SGD, the dataset is divided into smaller batches, and the model's parameters are updated
based on the gradient of the loss function for each batch.
• This process is repeated iteratively until the model converges to an optimal solution.
• The term "stochastic" refers to the random selection of samples from the dataset in each
iteration, which helps to prevent the algorithm from getting stuck in local minima.
• SGD is widely used in various machine learning applications, including linear regression,
logistic regression, and neural networks. It is an essential tool for training deep learning
models, as it enables efficient learning from large datasets with high-dimensional features.
A Feedforward Neural Network (FFNN) is a type of artificial neural network that
consists of multiple layers of interconnected nodes, where information flows only
in one direction, from the input layer to the output layer, passing through one or
more hidden layers. Here are the steps involved in building and training a
Feedforward Neural Network:
1. Define the network architecture: Specify the number of layers (input, hidden, and
output) and the number of nodes in each layer. The architecture can be determined
based on the problem's complexity and the available computational resources.
2. Initialize weights: Assign random or small initial weights to the connections
between nodes in the network. These weights will be adjusted during the training
process.
3. Input data preparation: Preprocess the input data, if necessary, by scaling,
normalizing, or encoding it to ensure it's suitable for the neural network.
4. Forward propagation: Present the preprocessed input data to the network. The
input nodes pass their values to the nodes in the first hidden layer, which then pass
their activations to the next hidden layer (if present), and so on, until the output
layer is reached. Each node in the network applies an activation function to its
weighted input values.
5. Calculate output: The final layer (output layer) produces the network's predicted
output based on the input data and the learned weights.
6. Evaluate error: Compute the difference between the predicted output and the
desired output (target or label). This difference represents the error made by the
network.
• Back propagation is a fundamental technique used in the training of neural networks which
helps in optimizing the weights and biases of a model based on the error between the
predicted output and the actual output.
• The basic idea behind this technique is to calculate the gradient of the loss function with
respect to each weight and bias in the model.
• The gradient tells us how much the loss function will be affected by changing the weights and
bias by a small amount.
• The main goal is to reduce the loss which is achieved by iteratively updating the weights and
bias of the model based on the gradient.
Backpropagation Steps:
1. Initialization:
1. Initialize the weights and biases of the neural network with small random values.
2. Define the learning rate, which determines the step size during weight updates.
2. Forward Pass:
1. Input a set of training data into the network to obtain predictions.
2. Propagate the input through the network layer by layer, calculating the weighted sum
and applying the activation function at each node.
3. Error Calculation:
1. Compare the predicted output with the actual target output to calculate the error.
2. Commonly used error metrics include Mean Squared Error (MSE) for regression and
Cross-Entropy Loss for classification.
4. Backward Pass (Backpropagation Proper):
1. Propagate the error backward through the network to compute the gradients of the
error with respect to the weights.
2. Calculate the gradient of the error with respect to the output of each node in the
network.
5. Weight Update:
1. Update the weights and biases using the calculated gradients and the learning rate.
2. This step involves adjusting the weights in the direction that minimizes the error.
6. Repeat:
1. Repeat the process for a specified number of epochs or until convergence.

Convergence in a neural network refers to the process where the network's performance or loss
function stabilizes and stops improving significantly with further training.
In other words, it is when the network has learned as much as it can from the given dataset and
adjustments to the weights become minimal.
During the training process, a neural network adjusts its internal parameters (weights) to minimize
the difference between the predicted output and the actual output.
This process continues until the network reaches a point where it can accurately classify or predict
the input data with a satisfactory level of accuracy.
At this stage, the network is said to have converged.
It is essential to note that the term "convergence" can be used in two ways:
Local Convergence: When the network's performance stops improving within a specific region
of the parameter space.
Global Convergence: When the network's performance stops improving throughout the entire
parameter space.
In practice, neural networks often converge to a local minimum of the loss function. To achieve
global convergence, advanced techniques like adaptive learning rates, batch normalization, or
more complex network architectures may be required.
In summary, convergence in a neural network is when the model's performance becomes stable
and stops improving significantly, marking the end of the learning process.

Local minima in a neural network refer to points in the network's error surface where the
error or loss function has a lower value than its surroundings, but there exists a higher value
(global minimum) somewhere else in the error space.
In simpler terms, it's a situation where the network has found a solution that works relatively well,
but it's not the best possible solution.
During the training process, a neural network adjusts its weights to minimize the error or loss
between the predicted output and the actual output. However, due to the complex nature of the
error surface and the high-dimensional space, the network may converge to a local minimum
instead of the global minimum, which would be the ideal solution.
Local minima can be a challenge in neural network training because they can lead to suboptimal
performance. To overcome this issue, various techniques can be employed, such as:
Initialization: Properly initializing the weights of the network can help it converge towards the
global minimum.
Learning rate: Adjusting the learning rate during training can help the network escape local
minima. A higher learning rate may help escape local minima faster, while a lower learning rate
can provide more precise adjustments near the global minimum.
Regularization: Techniques like L1, L2 regularization, or dropout can help prevent overfitting and
encourage the network to generalize better, making it less likely to get stuck in local minima.
Architecture: Designing the network architecture in a way that promotes better generalization can
also help avoid local minima.
Stochastic Gradient Descent (SGD) variations: Using variations of SGD, such as Adam,
RMSprop, or Adagrad, can help optimize the learning process and potentially escape local
minima.
Batch normalization: This technique helps normalize the inputs to each layer, making the
optimization process smoother and potentially preventing the network from getting stuck in local
minima.
In summary, local minima in neural networks can be a challenge during training, but various
techniques can be employed to overcome this issue and help the network find the global minimum
and achieve better performance.

The representational power of a feedforward neural network lies in its


ability to learn and model complex relationships within data. These networks consist of multiple
layers of interconnected nodes, also known as artificial neurons, that process and transform
information.
Feedforward neural networks are particularly useful for tasks such as pattern recognition,
classification, and prediction. They achieve this by learning to map input data to desired output
labels through a process called training. During training, the network adjusts the weights of its
connections to minimize the difference between the predicted output and the actual output.

The representational power of these networks is influenced by several factors:


1. Architecture: The number of layers and nodes in the network affects its ability to represent
and learn complex patterns. Deep networks with multiple layers can capture more intricate
relationships compared to shallow networks with fewer layers.
2. Activation functions: The type of activation function used in the network can impact its
representational power. Common activation functions include sigmoid, ReLU (Rectified
Linear Unit), and tanh (hyperbolic tangent). Different activation functions can better suit
different types of problems or data.
3. Non-linearity: Feedforward neural networks are capable of learning non-linear relationships
between input and output data, which is a significant aspect of their representational power.
4. Regularization: Overfitting can limit a network's representational power. Regularization
techniques, such as L1 and L2 regularization, help prevent overfitting and improve the
network's ability to generalize to new data.
5. Training algorithms: The choice of optimization algorithm used during training can also
impact the network's representational power. Some popular optimization algorithms include
stochastic gradient descent, Adam, and RMSProp.
In summary, the representational power of feedforward neural networks is a result of their
architecture, activation functions, non-linearity, regularization techniques, and training
algorithms. These factors combined enable them to learn and model complex relationships within
data, making them valuable tools in various machine learning applications

Hidden layer representation in neural network


In a neural network, a hidden layer is an intermediate layer between the input layer and the output
layer. It plays a crucial role in the network's ability to learn and represent complex patterns in the
data. The hidden layer(s) contain multiple neurons, each performing a specific computation based
on the input data and the learned weights.
The representation of a hidden layer in a neural network can be visualized as a matrix, where each
row represents the activation values of a neuron at a specific time step (or iteration) during the
training process. These activation values are calculated by applying an activation function (such
as sigmoid, ReLU, or tanh) to the weighted sum of the input values.
The hidden layer(s) help transform the input data into a more meaningful and useful
representation for the neural network to make predictions or classify the data effectively. The
number of hidden layers and the number of neurons in each layer can significantly impact the
network's performance and its ability to model complex patterns in the data.

Generalization in neural networks refers to the ability of a model to perform well on new,
unseen data, after being trained on a specific dataset. In simpler terms, it is the capability of a
neural network to learn from a training set and apply that knowledge to make accurate predictions
on unseen data.
Neural networks are designed to recognize patterns in data, and during the training process, they
learn to adjust their internal parameters to minimize the error between the predicted output and
the actual output. The goal is to find a balance between learning the patterns in the training data
(called "fitting") and generalizing that learning to unseen data.
Overfitting is a common issue in neural networks, where the model becomes too specialized to the
training data and struggles to generalize to new data. To avoid overfitting, various techniques can
be employed, such as:
1. Adding more training data: By increasing the size of the training dataset, the model can learn
more general patterns.
2. Regularization: This involves adding constraints to the model to prevent it from overfitting.
Common regularization techniques include L1 and L2 regularization.
3. Early stopping: This technique monitors the model's performance on a validation dataset
during training. When the validation performance starts to degrade, training is stopped to
prevent overfitting.
4. Using simpler models: Sometimes, using a simpler model with fewer parameters can help in
generalization.
5. Data augmentation: By artificially creating new training examples from the existing ones, the
model learns to recognize patterns in different variations of the data.
In summary, generalization in neural networks is crucial for their practical application. By
employing various techniques, we can ensure that a neural network can learn from the training
data and apply that knowledge to make accurate predictions on new, unseen data.

Overfitting & stopping criterion


Overfitting occurs when a machine learning model learns the training data too well, leading to
poor performance on unseen or new data (test data or validation data). This happens when the
model becomes too complex or has too many parameters relative to the amount of training data
available. The model starts to memorize the training data patterns rather than generalizing from
them.

To mitigate overfitting, several strategies can be employed:

1. Regularization: This technique adds a penalty term to the cost function to reduce the
complexity of the model. There are different types of regularization, such as L1 (Lasso), L2
(Ridge), and weight decay.

2. Cross-validation: Split the dataset into training and validation sets. Periodically evaluate the
model's performance on the validation set and stop training when the performance starts to
deteriorate.

3. Early stopping: Monitor the loss on the validation set during training. If the validation loss
starts to increase, stop training to avoid overfitting.

4. Simplify model architecture: Use a simpler model with fewer layers or nodes, or apply pruning
techniques to remove unnecessary connections.

5. Data augmentation: Increase the amount of training data by artificially generating new samples
from the existing data.

6. Feature selection: Choose a smaller set of relevant features that contribute more to the model's
performance.
A stopping criterion is a condition that determines when to stop training the model. It helps
prevent overfitting and ensures that the model generalizes well to new data. Some common
stopping criteria include:

1. Reaching a minimum error threshold: Stop training when the error (cost, loss) on the validation
set reaches a predefined threshold or starts to increase.

2. Maximum number of epochs: Set a maximum number of iterations (epochs) for the training
process. Once the maximum number of epochs is reached, stop training.

3. Early stopping: As mentioned earlier, this technique stops training when the validation loss
starts to increase, indicating overfitting.

4. Convergence: Stop training when the model's parameters (weights) stop changing significantly,
indicating that the model has reached a minimum in the cost function.

5. Computational budget: Stop training when the allocated computational resources (time,
memory, or processing power) are exhausted.

Choosing an appropriate stopping criterion depends on the specific problem, available resources,
and the desired trade-off between training time and model performance.

You might also like