0% found this document useful (0 votes)
5 views2 pages

Markdown To PDF

"But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system, and expound the actual teachings of the great explorer of the truth, the master-builder of human happiness. No one rejects, dislikes, or avoids pleasure itself, because it is pleasure, but because those who do not know how to pursue pleasure rationally encounter consequences that are extremely painful. Nor again is there anyone who loves or

Uploaded by

Dũng Minh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views2 pages

Markdown To PDF

"But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system, and expound the actual teachings of the great explorer of the truth, the master-builder of human happiness. No one rejects, dislikes, or avoids pleasure itself, because it is pleasure, but because those who do not know how to pursue pleasure rationally encounter consequences that are extremely painful. Nor again is there anyone who loves or

Uploaded by

Dũng Minh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Report on the Training Process of the Neural Network Model

1. Introduction

This report explains the training process of a neural network implemented in the provided code. The model consists of a single hidden layer and uses the following
key components:

Sigmoid activation function in the hidden layer.


Softmax function in the output layer for multi-class classification.
Cross-entropy loss as the objective function.

We will also delve into how the derivatives of the loss function are computed with respect to weights and biases, which form the basis of the backpropagation
process.

2. Model Architecture

Input Layer: Accepts data with ( M ) features.


Hidden Layer: Consists of ( D ) hidden units.
Output Layer: Produces ( K ) outputs corresponding to ( K ) classes using the softmax function.

3. Forward Pass

The forward pass computes the predictions for each data sample using the following steps:

1. Input Augmentation: Adds a bias term to the input features.


2. Linear Transformation: ( a = X_{\text{bias}} \cdot \alpha^T ), where ( \alpha ) is the weight matrix for the hidden layer.
3. Activation : The sigmoid function ( z = \sigma(a) ) is applied to compute the activations of the hidden layer.
4. Augmented Hidden Outputs: Bias is added to ( z ) for input into the output layer.
5. Output Transformation: ( b = z_{\text{bias}} \cdot \beta^T ), where ( \beta ) is the weight matrix for the output layer.
6. Softmax Activation : The softmax function ( \hat{y} = \text{softmax}(b) ) computes the class probabilities.

4. Loss Function

The cross-entropy loss for a dataset of size ( N ) is: [ \text{Loss} = -\frac{1}{N} \sum_{i=1}^N \log(\hat{y} {i, y_i}) ] where ( \hat{y}{i, y_i} ) is the predicted probability of
the true class ( y_i ).

5. Backward Pass (Gradient Computation)

The backward pass calculates gradients of the loss with respect to weights and biases to update them during training. The following steps are performed:

1. Output Layer Gradients:

Error at Output Layer: [ \delta_{\text{output}} = \hat{y} - y_{\text{one-hot}} ] where ( y_{\text{one-hot}} ) is the one-hot encoding of the true labels.
Gradient of Output Weights: [ \nabla_{\beta} = \frac{1}{N} \delta_{\text{output}}^T \cdot z_{\text{bias}} ]

2. Hidden Layer Gradients:

Error at Hidden Layer: [ \delta_{\text{hidden}} = (\delta_{\text{output}} \cdot \beta_{\text{no-bias}}) \odot \sigma'(z) ] where ( \beta_{\text{no-bias}} )
excludes the bias weights, and ( \sigma'(z) = z \cdot (1 - z) ) is the derivative of the sigmoid function.
Gradient of Hidden Weights: [ \nabla_{\alpha} = \frac{1}{N} \delta_{\text{hidden}}^T \cdot X_{\text{bias}} ]

6. Parameter Updates

Using stochastic gradient descent (SGD), the weights are updated as: [ \alpha \leftarrow \alpha - \eta \nabla_{\alpha}, \quad \beta \leftarrow \beta - \eta
\nabla_{\beta} ] where ( \eta ) is the learning rate.

7. Training Process

The train_and_test function performs the following:

1. Weight Initialization: Random or zero initialization.


2. Epoch Loop: Repeats for the specified number of epochs:
Performs a forward pass for each sample.
Computes the gradients via the backward pass.
Updates the weights and biases.
Computes and stores the loss on training and validation datasets.
3. Evaluation: After training, the model computes the error rate and makes predictions on both training and validation datasets.
8. Results

The function outputs:

Training and Validation Loss per epoch.


Final Training and Validation Errors .
Predicted Labels for training and validation datasets.

9. Conclusion

The training process effectively leverages backpropagation to optimize the weights and biases by minimizing the cross-entropy loss. Gradients computed with
respect to the loss ensure that each parameter is adjusted to reduce classification error over time.

You might also like