SHAI - Task 3 - NN
SHAI - Task 3 - NN
In today's era of artificial intelligence (AI), two major approaches stand out: machine
learning and deep learning. While machine learning has been around for some time,
deep learning has revolutionized how we tackle complex problems. This introduction
aims to highlight the main differences between the two and explain why deep
learning has become so important, especially with the explosion of data from the
internet and social media.
1.1 The main difference between Machine Learning & Deep learning
Machine learning and deep learning both teach computers to learn from data.
However, they differ in how they handle the data. Traditional machine learning
often requires humans to manually pick out important features from the data.
In contrast, deep learning does this automatically. Deep neural networks, the
backbone of deep learning, can figure out the relevant features on their own.
This ability makes deep learning particularly powerful for tasks like recognizing
objects in images or understanding speech.
1.2 The Rise of Deep Learning with Internet and Social Media
One big reason why deep learning has taken off is the vast amount of data
available, thanks to the internet and social media. These platforms generate
massive amounts of information every second, ranging from text to images to
videos. Deep learning models thrive on data, and the internet provides an
endless supply. This abundance of data has fueled breakthroughs in areas like
computer vision and natural language processing. Essentially, the internet and
social media have turbocharged the development of deep learning, allowing
researchers and companies to push the boundaries of what's possible.
2. Neural Networks
Input Layer: This layer receives the initial input data and passes it on to the
subsequent layers for processing. Each neuron in the input layer represents a feature
or attribute of the input data.
Hidden Layer(s): Intermediate layers between the input and output layers where the
majority of computation occurs. Each neuron in a hidden layer receives input from
the previous layer, applies a transformation using activation functions, and passes
the result to the next layer.
Output Layer: The final layer of the neural network responsible for producing the
desired output based on the processed information from the hidden layers. The
number of neurons in the output layer depends on the nature of the task, such as
classification or regression.
3. Neural Network Training
Training a neural network involves iteratively adjusting its parameters (weights and
biases) to minimize the difference between predicted and actual outputs using
specific Loss Function .
The weights and the biases should be initialized before the training begin , usually the
biases intilize with zeros , but weights must initialized as random value .
This optimization process relies on fundamental concepts such as the chain rule,
gradient descent, and propagation algorithms , and we will talk about each one of
them , and activation functions .
Fully Batch Gradient Descent: Updates the parameters using the gradients
computed from the entire training dataset.
Mini-Batch Gradient Descent: Divides the training dataset into small batches
and updates the parameters based on the gradients computed from each
batch.
Stochastic Gradient Descent (SGD): Updates the parameters after processing
each individual training example, making it computationally efficient but more
prone to noise.
3.3 Forward & Back Propagation
Forward propagation involves passing the input data through the neural
network to generate predictions. Each neuron in the network receives input
signals, applies a transformation using weights and biases, and passes the
result to the next layer until the output is obtained.
Neural networks are versatile models capable of performing both classification and
regression tasks. These tasks differ in their objectives and the nature of the output
they produce.
4.1 Classification
Classification tasks involve categorizing input data into discrete classes or categories.
The goal is to assign a label or class to each input based on its features. Neural
networks used for classification typically have an output layer with multiple neurons,
each corresponding to a different class. During training, the network learns to predict
the probability distribution over these classes for a given input.
4.2 Regression
Regression tasks, on the other hand, involve predicting continuous numerical values
based on input features. The objective is to estimate a real-valued output that best
fits the underlying relationship between the input variables. Neural networks used
for regression typically have a single output neuron that directly predicts the
continuous target variable.
Examples of regression tasks include predicting house prices based on features such
as size, location, and number of bedrooms, forecasting stock prices based on
historical data, and estimating the age of a person based on demographic
information.
4.3 Differences between Classification & Regression
While both classification and regression tasks involve making predictions based on
input data, they differ in several key aspects:
Loss Function: Classification tasks often use categorical cross-entropy or binary cross-
entropy loss functions, which measure the difference between predicted class
probabilities and true labels. Regression tasks typically use mean squared error (MSE)
or mean absolute error (MAE) loss functions, which quantify the difference between
predicted and actual numerical values.
Hyperparameters are configuration settings that are external to the model and
cannot be learned from the training data. They define the structure and behavior of
the neural network during training and influence its performance.
Parameters are the internal variables of the model that are learned during
training, such as weights and biases. They directly impact the model's
predictions. In contrast, hyperparameters are settings that govern the training
process itself, such as the learning rate, batch size, and choice of optimizer.
Hyperparameters must be specified by the user before training begins and can
significantly affect the model's performance.
The learning rate is a hyperparameter that controls the size of the step taken
during gradient descent optimization. It determines how much the model's
parameters are adjusted in each iteration to minimize the loss function. A high
learning rate may cause the model to overshoot the optimal solution, while a
low learning rate may result in slow convergence. Tuning the learning rate
involves experimenting with different values to find the optimal balance
between convergence speed and stability.
Learning rate schedules, such as exponential decay or step decay, adjust the
learning rate over time to improve convergence. Learning rate warm-up
techniques gradually increase the learning rate at the beginning of training to
accelerate convergence while mitigating the risk of instability.
5.3 Batch Size
The batch size determines the number of training examples used in each
iteration of gradient descent. Choosing an appropriate batch size is crucial for
efficient training. A smaller batch size may result in noisy gradients but faster
convergence, while a larger batch size may provide more stable gradients but
slower convergence.
Common approaches for selecting the batch size include using a default value
like 32, which is commonly used in practice, or adjusting it based on GPU
memory constraints. Learning rate warm-up techniques can help mitigate the
potential negative effects of larger batch sizes by gradually increasing the
learning rate during the initial training epochs.
The number of epochs specifies the total number of iterations over the entire
training dataset during training. While the number of epochs is not typically
fine-tuned, it's important to set it to a sufficiently high value to allow the
model to converge to an optimal solution. Early stopping techniques can be
employed to monitor the model's performance on a validation set and stop
training when performance begins to deteriorate, preventing overfitting.