0% found this document useful (0 votes)
4 views

ANN notes

The document provides an overview of Artificial Neural Networks, detailing types of connections (fully, partially, and linearly connected), common issues like vanishing and exploding gradients, and the importance of hyperparameters. It also explains forward and backward propagation, loss functions, and various activation functions with their use cases and limitations. Key concepts such as overfitting, underfitting, and normalization are also discussed.

Uploaded by

eyemusican333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

ANN notes

The document provides an overview of Artificial Neural Networks, detailing types of connections (fully, partially, and linearly connected), common issues like vanishing and exploding gradients, and the importance of hyperparameters. It also explains forward and backward propagation, loss functions, and various activation functions with their use cases and limitations. Key concepts such as overfitting, underfitting, and normalization are also discussed.

Uploaded by

eyemusican333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Notes of ANN

SUBJECT
Artificial Neural Networks
SECTION
Alpha

3. Fully Connected, Partially Connected, and Linearly


Connected Neural Networks
 Fully Connected: Every neuron in one layer is

connected to every neuron in the next layer.


 Partially Connected: Only a subset of neurons is

connected between layers, reducing computation.


 Linearly Connected: Neurons are connected in a linear

sequence, often used in recurrent networks.


6. Vanishing Gradient Problem
In deep networks, gradients become very small and stop
updating weights. Sigmoid/Tanh functions suffer from this.
Solution: Use ReLU, Batch Normalization, Skip
Connections (ResNets).

7. Exploding Gradient Problem


When gradients become too large, weights explode. Happens
in deep networks.
Solution: Use gradient clipping and proper weight
initialization.

8. Hyperparameters and Their Types


Hyperparameters control how a network learns:
 Learning Rate – Controls how much weights update.

 Batch Size – Number of samples per training step.

 Epochs – Number of times the model sees the full

dataset.
 Hidden Layers – Number of layers between input and

output.
9. Forward Propagation
The process of calculating output from inputs using weights
and activation functions.

10. Backward Propagation


A process where the network updates weights using
gradients from the loss function (uses chain rule of
differentiation).

11. Loss Function (Cost Function)


Loss function calculates error between predicted and actual
output.
Common types:
 Mean Squared Error (MSE) – For regression.

 Cross-Entropy Loss – For classification.

12. Concepts: Epoch, Learning Rate, Iteration, Batch Size,


Normalization, Overfitting, Underfitting
 Epoch: One complete pass through the dataset.

 Learning Rate: Step size for weight updates.

 Iteration: One batch processed in training.

 Batch Size: Number of samples processed before

updating weights.
 Normalization: Scaling inputs for better performance.

 Overfitting: Model memorizes training data, bad for

new data.
 Underfitting: Model is too simple, doesn't learn well.

Activation Functions: Explanation & Formulas


1. Binary Step Function
The simplest activation function, used in perceptrons for classification
tasks.
 Use Case: Used in early perceptron models and simple
classification tasks.
 Limitation: Not useful for deep learning since it doesn’t allow
for gradient-based optimization (no derivative for learning).

2. Linear Activation Function


A basic function where output is proportional to the input.
f(x)=axf(x) = ax
where aa is a constant.
 Use Case: Used in regression problems.

 Limitation: Cannot introduce non-linearity, making it


unsuitable for deep networks.

3. Sigmoid (Logistic) Function


A smooth S-shaped function that maps any real number to a range
between 0 and 1.

 Use Case: Used in binary classification problems.


 Limitation:
o Causes vanishing gradient for large or small xx.

o Output is not centered around 0, slowing convergence.

4. Hyperbolic Tangent (Tanh) Function


An improved version of sigmoid that maps values between -1 and 1.
 Use Case: Used in hidden layers to center activations around
zero.
 Limitation: Still suffers from vanishing gradient for large or
small xx.

5. Rectified Linear Unit (ReLU)


The most commonly used activation function in deep learning.

 Use Case: Used in hidden layers of deep neural networks.


 Advantages:
o Efficient and fast.

o Reduces the vanishing gradient problem.

 Limitation: Suffering from dying ReLU problem (neurons


stop learning when x≤0x \leq 0).

6. Leaky ReLU
A modified version of ReLU that allows small negative values instead
of zero.

where α\alpha is a small constant (e.g., 0.01).


 Use Case: Helps avoid dying ReLU problem.

7. Exponential Linear Unit (ELU)


Similar to Leaky ReLU but with smoother transitions for negative
inputs.
 Use Case: Reduces bias shift and helps deeper networks train
better.

9. Swish Function
A self-gated activation function proposed by Google.

 Use Case: Used in modern deep learning architectures.


 Advantage: Often performs better than ReLU in some tasks.

10. GELU (Gaussian Error Linear Unit)


Improves upon ReLU and Swish by using a Gaussian function.

where Φ(x)\Phi(x) is the cumulative distribution function of the


Gaussian distribution.
 Use Case: Used in transformer models like BERT.

11. SELU (Scaled Exponential Linear Unit)


A variation of ELU that normalizes activations automatically.

where λ\lambda and α\alpha are constants.


 Use Case: Used in self-normalizing neural networks (SNNs).

Choosing the Right Activation Function


Activation
Use Case Limitation
Function
Binary Step Simple classifiers Not differentiable
Linear Regression No non-linearity
Sigmoid Binary classification Vanishing gradient
Tanh Hidden layers Still vanishes
ReLU Deep learning Dying ReLU problem
Leaky ReLU Fixing dying ReLU May not always help
ELU Improved ReLU More computational cost
Multi-class Large input values cause
Softmax
classification instability
Advanced deep
Swish Computationally expensive
learning
GELU Transformer models High complexity
Self-normalizing Requires specific
SELU
networks initialization
Let me know if you need more explanations! 🚀

You might also like