deep learnig u2
deep learnig u2
• Definition: Deep Feedforward Neural Networks are a type of artificial neural network
where connections between the nodes do not form a cycle. This is the simplest form of
neural networks.
• Architecture: Consists of an input layer, several hidden layers, and an output layer.
• Activation Functions: Commonly used activation functions include Sigmoid, Tanh, and
ReLU.
• Forward Propagation: Involves calculating the output of each neuron from the input
layer to the output layer.
• Use Cases: Image and speech recognition, language translation, and other applications
requiring pattern recognition.
o Batch Gradient Descent: Uses the entire dataset for each update.
o Mini-batch Gradient Descent: Uses a small random subset of the dataset at each
update.
3. Momentum-Based GD
4. Nesterov Accelerated GD
• Definition: An improved version of momentum-based GD that looks ahead to the
estimated future position.
• Definition: An iterative method for optimizing an objective function using one training
example at a time.
• Disadvantages: Can lead to noisy updates and require careful tuning of the learning rate.
6. AdaGrad
• Definition: An adaptive gradient algorithm that adjusts the learning rate for each
parameter based on historical gradient information.
7. Adam
• Definition: Combines the advantages of AdaGrad and RMSProp, using adaptive learning
rates and momentum.
• Parameters: β1\beta_1 (decay rate for the first moment), β2\beta_2 (decay rate for the
second moment), ϵ\epsilon (small constant).
8. RMSProp
• Definition: An optimization algorithm that adjusts the learning rate by dividing the
gradient by a running average of its recent magnitude.
• Techniques:
o L1 Regularization: Adds the absolute value of the weights to the loss function.
• Definition: Train on a noisy version of the input data and aim to reconstruct the clean
input.
• Objective: Improve the model's robustness and ability to capture relevant structures in
the data.
• Definition: Penalize the gradient of the encoder's activations with respect to the input to
make the learned representation robust to small variations.
• Formula: Add a term to the loss function proportional to the Frobenius norm of the
Jacobian of the hidden representations.
• Definition: A generative model that learns a probabilistic distribution over the latent
space, allowing for the generation of new data samples.
• Objective: Maximize the Evidence Lower Bound (ELBO) to ensure the latent variables
follow a desired distribution.
• PCA: Auto-encoders can be seen as a non-linear extension of PCA, which performs linear
dimensionality reduction.
• SVD: Singular Value Decomposition can be used to analyze the linear transformations
performed by auto-encoders.