Experiment 2.4 DL
Experiment 2.4 DL
Key Concepts:
• Neurons & Layers: Each neuron receives inputs, processes them, and passes the output to
the next layer. DNNs typically consist of an input layer, one or more hidden layers, and an
output layer.
• Activation Functions: Functions that introduce non-linearity into the model. Common
activation functions include:
o ReLU (Rectified Linear Unit): f(x)=max(0,x)f(x)=max(0,x) allows only
positive values to pass and is computationally efficient.
o Softmax: Used in the output layer for multi-class classification, converting raw
scores into probabilities for each class.
• Forward Propagation: The process of passing inputs through the network to obtain
outputs.
2. Loss Functions
Loss functions measure how well the neural network's predictions match the actual target values.
The goal of training is to minimize this loss.
• Cross-Entropy Loss: Commonly used for classification tasks. It measures the performance
of a model whose output is a probability value between 0 and 1. Mathematically for a single
instance:
Loss=−∑i=1Cyilog(pi)Loss=−i=1∑Cyilog(pi)
where yiyi is the true label (one-hot encoded) and pipi is the predicted probability.
3. Backpropagation Algorithm
Backpropagation is the primary method for training neural networks. It involves:
• Computing the gradient of the loss function concerning each weight using the chain rule.
• Propagating the gradients backward through the network to update weights via gradient
descent.
4. Gradient Descent
A common optimization algorithm used to update the weights in the direction that most reduces the
loss function.
• Learning Rate: A hyperparameter that controls how much to change the model in response
to the estimated error each time the weights are updated. A small learning rate can lead to
slow convergence, while a large one might overshoot minima.
5. One-Hot Encoding
A method for converting categorical variables into a binary matrix representation. Each category
value is converted into a binary vector that is all zeros except for the index of the category, which is
marked with a one. It's particularly useful for feeding categorical data into neural networks.
• Dropout: Randomly zeros a fraction of the neurons during training to prevent co-
adaptation.
• L2 Regularization: Adds a penalty to the loss function based on the magnitude of the
weights, discouraging overly complex models.
def softmax(x):
exp_x = np.exp(x - np.max(x, axis=1, keepdims=True)) # Improved stability
return exp_x / np.sum(exp_x, axis=1, keepdims=True)
# Backward pass
d_y_pred = y_pred - y_train
d_h2 = np.dot(d_y_pred, w3.T) * relu_derivative(h2)
d_h1 = np.dot(d_h2, w2.T) * relu_derivative(h1)
Output:
Conclusion:
Understanding the theoretical foundations of neural networks provides insights into how they work
and the challenges faced during training. This knowledge is critical for optimizing performance and
effectively applying DNNs in various applications such as image classification, natural language
processing, and more.