0% found this document useful (0 votes)
25 views10 pages

SHAI - Task 3 - NN

Introduction to neural networks

Uploaded by

mohmadanasshb21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views10 pages

SHAI - Task 3 - NN

Introduction to neural networks

Uploaded by

mohmadanasshb21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Deep Learning

Neural Networks Report


Based on :
Hands on machine learning Book & StatQuest
Task 3

Mhd Anas Al-Sheikh Bakri


1. Introduction

In today's era of artificial intelligence (AI), two major approaches stand out: machine
learning and deep learning. While machine learning has been around for some time,
deep learning has revolutionized how we tackle complex problems. This introduction
aims to highlight the main differences between the two and explain why deep
learning has become so important, especially with the explosion of data from the
internet and social media.

1.1 The main difference between Machine Learning & Deep learning

Machine learning and deep learning both teach computers to learn from data.
However, they differ in how they handle the data. Traditional machine learning
often requires humans to manually pick out important features from the data.
In contrast, deep learning does this automatically. Deep neural networks, the
backbone of deep learning, can figure out the relevant features on their own.
This ability makes deep learning particularly powerful for tasks like recognizing
objects in images or understanding speech.

1.2 The Rise of Deep Learning with Internet and Social Media

One big reason why deep learning has taken off is the vast amount of data
available, thanks to the internet and social media. These platforms generate
massive amounts of information every second, ranging from text to images to
videos. Deep learning models thrive on data, and the internet provides an
endless supply. This abundance of data has fueled breakthroughs in areas like
computer vision and natural language processing. Essentially, the internet and
social media have turbocharged the development of deep learning, allowing
researchers and companies to push the boundaries of what's possible.
2. Neural Networks

Neural networks, a cornerstone of modern machine learning, draw inspiration from


the intricate workings of the human brain. Scientists were fascinated by the brain's
ability to process information and learn from experience, leading them to develop
artificial neurons that mimic the behavior of biological neurons.

At its core, a neural network comprises interconnected nodes, or neurons, arranged


in layers. These artificial neurons simulate the functionality of biological neurons by
receiving input signals, processing them through an activation function, and
transmitting output signals to other neurons. The connections between neurons,
often referred to as weights, determine the strength of the signal transmission.

Neural networks typically consist of three types of layers:

Input Layer: This layer receives the initial input data and passes it on to the
subsequent layers for processing. Each neuron in the input layer represents a feature
or attribute of the input data.

Hidden Layer(s): Intermediate layers between the input and output layers where the
majority of computation occurs. Each neuron in a hidden layer receives input from
the previous layer, applies a transformation using activation functions, and passes
the result to the next layer.

Output Layer: The final layer of the neural network responsible for producing the
desired output based on the processed information from the hidden layers. The
number of neurons in the output layer depends on the nature of the task, such as
classification or regression.
3. Neural Network Training

Training a neural network involves iteratively adjusting its parameters (weights and
biases) to minimize the difference between predicted and actual outputs using
specific Loss Function .
The weights and the biases should be initialized before the training begin , usually the
biases intilize with zeros , but weights must initialized as random value .
This optimization process relies on fundamental concepts such as the chain rule,
gradient descent, and propagation algorithms , and we will talk about each one of
them , and activation functions .

3.1 The Chain Rule

The chain rule is a fundamental concept in calculus that helps us understand


how changes in multiple variables affect each other. In the context of neural
networks, it allows us to compute the derivatives of complex functions
composed of several nested functions. By applying the chain rule, we can
efficiently calculate the gradients, which indicate the direction and magnitude
of changes needed to minimize the network's error.

3.2 Gradient Descent

Gradient descent is an optimization algorithm used to minimize a function by


iteratively moving in the direction of the steepest decrease in the function's
value. In the context of neural networks, the function we aim to minimize is
the loss function, which quantifies the disparity between predicted and actual
outputs. There are several variants of gradient descent:

Fully Batch Gradient Descent: Updates the parameters using the gradients
computed from the entire training dataset.
Mini-Batch Gradient Descent: Divides the training dataset into small batches
and updates the parameters based on the gradients computed from each
batch.
Stochastic Gradient Descent (SGD): Updates the parameters after processing
each individual training example, making it computationally efficient but more
prone to noise.
3.3 Forward & Back Propagation

Forward propagation involves passing the input data through the neural
network to generate predictions. Each neuron in the network receives input
signals, applies a transformation using weights and biases, and passes the
result to the next layer until the output is obtained.

Backward propagation, also known as backpropagation, is the process of


computing gradients of the loss function with respect to the parameters of the
network. It involves propagating the error backwards from the output layer to
the input layer, using the chain rule to efficiently compute the gradients layer
by layer. These gradients are then used in conjunction with gradient descent to
update the parameters, thereby optimizing the network's performance.

3.4 Activation Function

Activation functions play a crucial role in neural networks by introducing


nonlinearity into the model. Without nonlinearity between layers, even a deep
stack of layers would be equivalent to a single layer. This limitation prevents
the network from effectively capturing complex patterns and solving intricate
problems.

Activation functions enable neural networks to learn complex mappings


between inputs and outputs by introducing nonlinear transformations. They
determine the output of individual neurons based on their input signals and
help the network model intricate relationships in the data. Common activation
functions include sigmoid, tanh, ReLU (Rectified Linear Unit), and softmax,
each with its own characteristics and suitability for different types of tasks.
4. Classification & Regression in Neural Networks

Neural networks are versatile models capable of performing both classification and
regression tasks. These tasks differ in their objectives and the nature of the output
they produce.

4.1 Classification

Classification tasks involve categorizing input data into discrete classes or categories.
The goal is to assign a label or class to each input based on its features. Neural
networks used for classification typically have an output layer with multiple neurons,
each corresponding to a different class. During training, the network learns to predict
the probability distribution over these classes for a given input.

Common examples of classification tasks include image classification (identifying


objects in images), sentiment analysis (determining the sentiment of text), and spam
detection (classifying emails as spam or non-spam).

4.2 Regression

Regression tasks, on the other hand, involve predicting continuous numerical values
based on input features. The objective is to estimate a real-valued output that best
fits the underlying relationship between the input variables. Neural networks used
for regression typically have a single output neuron that directly predicts the
continuous target variable.

Examples of regression tasks include predicting house prices based on features such
as size, location, and number of bedrooms, forecasting stock prices based on
historical data, and estimating the age of a person based on demographic
information.
4.3 Differences between Classification & Regression

While both classification and regression tasks involve making predictions based on
input data, they differ in several key aspects:

Output Type: In classification, the output is categorical, representing class labels or


probabilities. In regression, the output is continuous, representing numerical values.

Loss Function: Classification tasks often use categorical cross-entropy or binary cross-
entropy loss functions, which measure the difference between predicted class
probabilities and true labels. Regression tasks typically use mean squared error (MSE)
or mean absolute error (MAE) loss functions, which quantify the difference between
predicted and actual numerical values.

Evaluation Metrics: Classification models are evaluated using metrics such as


accuracy, precision, recall, and F1-score, which assess the model's performance in
correctly classifying instances. Regression models are evaluated using metrics such as
mean squared error, mean absolute error, and R-squared, which measure the
accuracy of the predicted numerical values relative to the ground truth.
5. Hyperparameter for Neural Network

Hyperparameters are configuration settings that are external to the model and
cannot be learned from the training data. They define the structure and behavior of
the neural network during training and influence its performance.

5.1 Difference between Hyperparameters & Parameters

Parameters are the internal variables of the model that are learned during
training, such as weights and biases. They directly impact the model's
predictions. In contrast, hyperparameters are settings that govern the training
process itself, such as the learning rate, batch size, and choice of optimizer.
Hyperparameters must be specified by the user before training begins and can
significantly affect the model's performance.

5.2 Learning Rate

The learning rate is a hyperparameter that controls the size of the step taken
during gradient descent optimization. It determines how much the model's
parameters are adjusted in each iteration to minimize the loss function. A high
learning rate may cause the model to overshoot the optimal solution, while a
low learning rate may result in slow convergence. Tuning the learning rate
involves experimenting with different values to find the optimal balance
between convergence speed and stability.

Learning rate schedules, such as exponential decay or step decay, adjust the
learning rate over time to improve convergence. Learning rate warm-up
techniques gradually increase the learning rate at the beginning of training to
accelerate convergence while mitigating the risk of instability.
5.3 Batch Size

The batch size determines the number of training examples used in each
iteration of gradient descent. Choosing an appropriate batch size is crucial for
efficient training. A smaller batch size may result in noisy gradients but faster
convergence, while a larger batch size may provide more stable gradients but
slower convergence.

Common approaches for selecting the batch size include using a default value
like 32, which is commonly used in practice, or adjusting it based on GPU
memory constraints. Learning rate warm-up techniques can help mitigate the
potential negative effects of larger batch sizes by gradually increasing the
learning rate during the initial training epochs.

5.4 Optimizer Function

The optimizer function is responsible for updating the model's parameters


during training based on the computed gradients. There are various optimizer
algorithms available, each with its own strengths and weaknesses. Popular
optimizers include stochastic gradient descent (SGD), Adam.

Choosing the best optimizer for a specific problem involves experimentation


and depends on factors such as the dataset size, model architecture, and
convergence speed requirements. It's often advisable to start with a well-
established optimizer like Adam and fine-tune its hyperparameters if
necessary.

5.5 Activation Function

Activation functions introduce nonlinearity into the neural network, enabling it


to learn complex relationships in the data. Choosing the appropriate activation
function depends on the task and the properties of the data. Rectified Linear
Unit (ReLU) is commonly used in hidden layers due to its simplicity and
effectiveness in mitigating the vanishing gradient problem. Sigmoid and
softmax activation functions are suitable for output layers in binary and
multiclass classification tasks, respectively. However, using sigmoid activation
functions in hidden layers may lead to vanishing or exploding gradients,
hindering training stability.
5.6 Number of iterations (Epochs)

The number of epochs specifies the total number of iterations over the entire
training dataset during training. While the number of epochs is not typically
fine-tuned, it's important to set it to a sufficiently high value to allow the
model to converge to an optimal solution. Early stopping techniques can be
employed to monitor the model's performance on a validation set and stop
training when performance begins to deteriorate, preventing overfitting.

You might also like