Autoencoder Loss Minimization
Autoencoder Loss Minimization
We solve:
4β 4α
α= , β=
1 + 4β 2 1 + 4α2
Substituting β into α:
4α
4 ( 1+4α 2)
α=
4 ( (1+4α 2 )2 )
16α2
1+
2
α=β= .
5
Thus, the optimal values of α and β that minimize the loss function are:
2 2
α= , β= .
5 5
4/9
what will be the weights w1 and w3 at iteration t+1
assuming backpropagation with ordinary gradient descent
is used?
note d/dx (tanh(x)) = 1-tanh^2 (x)
1. Input layer: x1 = 1, x2 = 1
L = − ∑ y log(o) + (1 − y) log(1 − o)
At iteration t:
w 1 = 0, w 2 = 1, w 3 = 1, w 4 = 0.
h ≈ 0.7616
5/9
Output Layer Computation
1
σ(x) =
1 + e−x
For o1 :
1
o1 = ≈ 0.6817
1 + e−0.7616
For o2 :
1
o2 = = 0.5
1 + e0
Assuming y1 = 1, y 2 = 1:
∂L
=o−y
∂o
6/9
δ1 = o1 − y1 = 0.6817 − 1 = −0.3183
δ2 = o2 − y2 = 0.5 − 1 = −0.5
Weight gradients:
∂L
= δ1 h = (−0.3183)(0.7616) = −0.2423
∂w3
∂L
= δ2 h = (−0.5)(0.7616) = −0.3808
∂w4
δh = (w3 δ1 + w4 δ2 )(1 − h2 )
= (−0.3183)(1 − 0.58)
= (−0.3183)(0.42) = −0.1337
∂L
= δh x1 = (−0.1337)(1) = −0.1337
∂w1
∂L
= δh x2 = (−0.1337)(1) = −0.1337
∂w2
∂L
wt = wt−1 − η + μΔwt−1
∂w
where:
7/9
We assume Δwt−1 = 0 since no past changes are given.
(t+1)
w1 = w1 − 0.25(−0.1337) = 0 + 0.0334 = 0.0334
(t+1)
w3
= w3 − 0.25(−0.2423) = 1 + 0.0606 = 1.0606
Autoencoder Training
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def tanh_derivative(x):
return 1 - np.tanh(x)**2
# Given parameters
x1, x2 = 1, 1 # Input values
y1, y2 = 1, 1 # Target values
w1, w2, w3, w4 = 0, 1, 1, 0 # Weights at iteration t
learning_rate = 0.25
momentum = 0.75
prev_w1, prev_w3 = -0.5, 0.5 # Weights at iteration t-1
# Compute loss
loss = - (y1 * np.log(o1) + (1 - y1) * np.log(1 - o1)) - (y2 * np.log(o2) + (1 - y2) * n
# Compute gradients
delta1 = o1 - y1
delta2 = o2 - y2
8/9
# Update weights
w1_new = w1 - learning_rate * dL_dw1 + momentum * (w1 - prev_w1)
w3_new = w3 - learning_rate * dL_dw3 + momentum * (w3 - prev_w3)
The provided Python code implements these calculations. Let me know if you need further
clarifications! 🚀
9/9