0% found this document useful (0 votes)
2 views

deep learnig u2

The document provides an overview of various neural network architectures and optimization algorithms, including Deep Feedforward Neural Networks, Gradient Descent, and different types of auto-encoders. It discusses their definitions, architectures, formulas, advantages, and applications in machine learning. Additionally, it covers techniques for regularization and dataset augmentation to enhance model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

deep learnig u2

The document provides an overview of various neural network architectures and optimization algorithms, including Deep Feedforward Neural Networks, Gradient Descent, and different types of auto-encoders. It discusses their definitions, architectures, formulas, advantages, and applications in machine learning. Additionally, it covers techniques for regularization and dataset augmentation to enhance model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Deep Feedforward Neural Networks

• Definition: Deep Feedforward Neural Networks are a type of artificial neural network
where connections between the nodes do not form a cycle. This is the simplest form of
neural networks.

• Architecture: Consists of an input layer, several hidden layers, and an output layer.

• Activation Functions: Commonly used activation functions include Sigmoid, Tanh, and
ReLU.

• Forward Propagation: Involves calculating the output of each neuron from the input
layer to the output layer.

• Use Cases: Image and speech recognition, language translation, and other applications
requiring pattern recognition.

2. Gradient Descent (GD)

• Definition: An optimization algorithm used to minimize the cost function by iteratively


adjusting the model parameters in the opposite direction of the gradient.

• Types of Gradient Descent:

o Batch Gradient Descent: Uses the entire dataset for each update.

o Stochastic Gradient Descent: Uses one training example at each update.

o Mini-batch Gradient Descent: Uses a small random subset of the dataset at each
update.

3. Momentum-Based GD

• Definition: Enhances gradient descent by adding a momentum term to accelerate


convergence and prevent oscillations.

• Formula: v(t)=γv(t−1)+η∇J(θ)v(t) = \gamma v(t-1) + \eta \nabla J(\theta)

o v(t)v(t): velocity (momentum term)

o γ\gamma: momentum hyperparameter

o η\eta: learning rate

o ∇J(θ)\nabla J(\theta): gradient of the cost function

4. Nesterov Accelerated GD
• Definition: An improved version of momentum-based GD that looks ahead to the
estimated future position.

• Formula: v(t)=γv(t−1)+η∇J(θ−γv(t−1))v(t) = \gamma v(t-1) + \eta \nabla J(\theta -


\gamma v(t-1))

5. Stochastic Gradient Descent (SGD)

• Definition: An iterative method for optimizing an objective function using one training
example at a time.

• Advantages: Faster convergence for large datasets, reduced computational cost.

• Disadvantages: Can lead to noisy updates and require careful tuning of the learning rate.

6. AdaGrad

• Definition: An adaptive gradient algorithm that adjusts the learning rate for each
parameter based on historical gradient information.

• Formula: θt+1=θt−ηGt+ϵ∇J(θt)\theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{G_{t} +


\epsilon}} \nabla J(\theta_t)

o GtG_t: sum of the squares of the past gradients

o ϵ\epsilon: small constant to avoid division by zero

7. Adam

• Definition: Combines the advantages of AdaGrad and RMSProp, using adaptive learning
rates and momentum.

• Parameters: β1\beta_1 (decay rate for the first moment), β2\beta_2 (decay rate for the
second moment), ϵ\epsilon (small constant).

• Formula: mt=β1mt−1+(1−β1)∇J(θt)m_t = \beta_1 m_{t-1} + (1 - \beta_1) \nabla


J(\theta_t) and vt=β2vt−1+(1−β2)(∇J(θt))2v_t = \beta_2 v_{t-1} + (1 - \beta_2) (\nabla
J(\theta_t))^2

8. RMSProp

• Definition: An optimization algorithm that adjusts the learning rate by dividing the
gradient by a running average of its recent magnitude.

• Formula: θt+1=θt−ηE[g2]t+ϵ∇J(θt)\theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{E[g^2]_{t} +


\epsilon}} \nabla J(\theta_t)
9. Auto-encoder

• Definition: An unsupervised learning model used to encode input data into a


compressed representation and then decode it back to reconstruct the input.

• Architecture: Consists of an encoder (compresses the input) and a decoder (reconstructs


the input).

• Applications: Dimensionality reduction, feature learning, and anomaly detection.

10. Regularization in Auto-encoders

• Purpose: Prevent overfitting and improve generalization by adding constraints to the


model.

• Techniques:

o L1 Regularization: Adds the absolute value of the weights to the loss function.

o L2 Regularization: Adds the square of the weights to the loss function.

11. Denoising Auto-encoders

• Definition: Train on a noisy version of the input data and aim to reconstruct the clean
input.

• Objective: Improve the model's robustness and ability to capture relevant structures in
the data.

12. Sparse Auto-encoders

• Definition: Apply sparsity constraints on the hidden layer activations to encourage


learning a compact and efficient representation.

• Technique: Use an additional sparsity penalty term in the loss function.

13. Contractive Auto-encoders

• Definition: Penalize the gradient of the encoder's activations with respect to the input to
make the learned representation robust to small variations.

• Formula: Add a term to the loss function proportional to the Frobenius norm of the
Jacobian of the hidden representations.

14. Variational Auto-encoder

• Definition: A generative model that learns a probabilistic distribution over the latent
space, allowing for the generation of new data samples.
• Objective: Maximize the Evidence Lower Bound (ELBO) to ensure the latent variables
follow a desired distribution.

15. Auto-encoders relationship with PCA and SVD

• PCA: Auto-encoders can be seen as a non-linear extension of PCA, which performs linear
dimensionality reduction.

• SVD: Singular Value Decomposition can be used to analyze the linear transformations
performed by auto-encoders.

16. Dataset Augmentation

• Definition: Techniques to artificially increase the size and diversity of a dataset by


applying transformations like rotation, scaling, flipping, and adding noise.

• Purpose: Improve the model's generalization by providing more varied training


examples.

You might also like