Lab4 Optimization
Lab4 Optimization
Prof. HSSAYNI
Goal
Implement advanced gradient descent methods and compare their convergence behavior, speed,
and stability.
v = βv + (1 − β)∇θ J(θ)
θ = θ − αv
• Visualizations:
1
2. AdaGrad (Adaptive Gradient Algorithm)
Objective: Implement AdaGrad and observe how adaptive learning rates affect convergence.
Instructions:
• Initialize the parameter θ and the gradient accumulation variable G = 0.
• Update parameters according to:
G = G + ∇θ J(θ)2
α
θ=θ− √ ∇θ J(θ)
G+ϵ
where ϵ is a small constant to avoid division by zero.
• Visualizations:
– Plot the cost function over iterations.
– Compare AdaGrad’s convergence with standard gradient descent.
2
5. Nadam (Nesterov-accelerated Adaptive Moment Estimation)
Objective: Implement Nadam to study the effect of Nesterov acceleration on Adam’s conver-
gence.
Instructions:
m = β1 m + (1 − β1 )∇θ J(θ)
v = β2 v + (1 − β2 )∇θ J(θ)2
m v
m̂ = , v̂ =
1 − β1t 1 − β2t
α
θ=θ− √ (β1 m̂ + (1 − β1 )∇θ J(θ))
v̂ + ϵ
• Visualizations:
Discussion Questions
• Speed vs. Stability: How do the convergence speed and stability differ across methods?
Which method converged fastest? Which was the most stable?
• Effect of Parameters: How do the choices of learning rate, momentum, and decay rates
affect each method’s performance?