Mathematics For Machine Learning V5
Mathematics For Machine Learning V5
Equations (V5)
1. Linear Algebra
• Addition of Vectors:
u1 + v1
u2 + v2
u + v = ..
.
un + vn
• Scaling a Vector:
cv1
cv2
c · v = ..
.
cvn
• Matrix-Vector Product:
Pn
j=1 a1j vj
Pn a2j vj
j=1
Av =
..
Pn .
j=1 amj vj
• Matrix Trace: n
X
tr(A) = aii
i=1
• Eigenvector Equation:
Av = λv
1
• Vector Projection:
a·b
projb (a) = b
b·b
• Inverse of a 2x2 Matrix:
−1 1 a22 −a12
A =
det(A) −a21 a11
• Orthogonality Condition:
u · v = 0 if u and v are orthogonal.
• Variance (Alternative):
Var(X) = E[X 2 ] − (E[X])2
• Covariance (Alternative):
Cov(X, Y ) = E[XY ] − E[X]E[Y ]
• Entropy: X
H(X) = − P (x) log P (x)
x
• KL Divergence:
X P (x)
DKL (P ||Q) = P (x) log
x
Q(x)
• Conditional Expectation:
Z
E[Y |X] = yfY |X (y|x)dy
y
2
3. Calculus
• Power Rule:
d n
[x ] = nxn−1
dx
• Product Rule:
d dv du
[uv] = u + v
dx dx dx
• Quotient Rule:
d h u i v du − u dv
= dx 2 dx
dx v v
• Exponential Derivative:
d x
[e ] = ex
dx
• Logarithmic Derivative:
d 1
[ln x] =
dx x
• Integral of a Power Function:
xn+1
Z
xn dx = +C for n ̸= −1
n+1
• Jacobian Matrix:
∂fi
Jij =
∂xj
4. Optimization
• Stochastic Gradient Descent (SGD):
w ← w − η∇J(w; xi , yi )
3
• RMSProp Update Rule:
η
w←w− p ∇J(w)
2
∇ J(w) + ϵ
• Adam Optimization:
• Gradient Clipping:
∇J(w)
∇J(w) ←
max(1, ||∇J(w)||/c)
• Newton’s Method:
wt+1 = wt − ηH−1 ∇J(wt )
5. Regression Models
• Linear Regression Hypothesis:
ŷ = Xw + b
w = (XT X)−1 XT y
4
• Logistic Regression Hypothesis:
1
ŷ = σ(Xw + b), σ(z) =
1 + e−z
• Cross-Entropy Loss:
m
1 X
J(w) = − [yi log(ŷi ) + (1 − yi ) log(1 − ŷi )]
m i=1
• Adjusted R-squared:
(1 − R2 )(n − 1)
R̄2 = 1 −
n−p−1
• Huber Loss: (
1 2
2
a if |a| ≤ δ,
Lδ (a) = 1
δ(|a| − 2
δ) if |a| > δ
5
6. Neural Networks
• Perceptron Update Rule:
w ← w + η(y − ŷ)x
f (x) = max(0, x)
• Softmax Function:
ezi
Softmax(zi ) = Pn
j=1 ezj
a = σ(wT x + b)
∂J
= x(ŷ − y)
∂w
• Dropout Regularization:
(l) (l)
hi = ri hi , ri ∼ Bernoulli(p)
• Batch Normalization:
xi − µ B
x̂i = p 2 , yi = γ x̂i + β
σB + ϵ
6
7. Clustering
• k-Means Objective Function:
K X
X
J= ||xi − µk ||2
k=1 i∈Ck
• Silhouette Score:
b(i) − a(i)
s(i) =
max(a(i), b(i))
• DBSCAN Core Point Condition:
• Expectation-Maximization (E-step):
πk N (xi |µk , Σk )
γik = PK
j=1 πj N (xi |µj , Σj )
• Expectation-Maximization (M-step):
PN PN
γik xi γik (xi − µk )(xi − µk )T
µk = Pi=1
N
and Σk = i=1
PN
i=1 γik i=1 γik
7
8. Dimensionality Reduction
• Principal Component Analysis (PCA) Objective:
Cw = λw
X = UΣVT
wT Sb w
J(w) =
wT Sw w
• Autoencoder Reconstruction:
X ≈ g(f (X))
8
9. Probability Distributions
• Bernoulli Distribution:
P (X = x) = px (1 − p)1−x , x ∈ {0, 1}
• Binomial Distribution:
n k
P (X = k) = p (1 − p)n−k , k ∈ {0, 1, . . . , n}
k
• Poisson Distribution:
λk e−λ
P (X = k) = , k≥0
k!
• Uniform Distribution:
(
1
b−a
, a≤x≤b
f (x) =
0, otherwise
• Normal Distribution:
1 (x−µ)2
f (x) = √ e− 2σ 2
2πσ 2
• Exponential Distribution:
(
λe−λx , x ≥ 0
f (x) =
0, x<0
• Beta Distribution:
xα−1 (1 − x)β−1
f (x; α, β) = , x ∈ [0, 1]
B(α, β)
• Gamma Distribution:
β α xα−1 e−βx
f (x; α, β) = , x≥0
Γ(α)
• Multinomial Distribution:
n!
P (X1 = x1 , . . . , Xk = xk ) = px1 1 px2 2 · · · pxkk
x1 !x2 ! · · · xk !
• Chi-Square Distribution:
k x
x 2 −1 e− 2
f (x; k) = k , x≥0
2 2 Γ( k2 )
9
10. Reinforcement Learning
• Bellman Equation for State-Value Function:
• Policy Improvement:
π ′ (s) = arg max Q(s, a)
a
• Reward Function:
R(s, a) = E[Rt |St = s, At = a]
• Discounted Return: ∞
X
Gt = γ k Rt+k+1
k=0
10