0% found this document useful (0 votes)
4 views3 pages

La ML

The document outlines essential mathematical concepts for machine learning, including linear algebra, probability, statistics, calculus, and various machine learning algorithms such as linear regression, logistic regression, and neural networks. Key equations and properties are presented, such as the dot product, Bayes' theorem, and optimization techniques. Additionally, it covers advanced topics like PCA, SVM, and clustering methods.

Uploaded by

sandeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

La ML

The document outlines essential mathematical concepts for machine learning, including linear algebra, probability, statistics, calculus, and various machine learning algorithms such as linear regression, logistic regression, and neural networks. Key equations and properties are presented, such as the dot product, Bayes' theorem, and optimization techniques. Additionally, it covers advanced topics like PCA, SVM, and clustering methods.

Uploaded by

sandeep
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Mathematics for Machine Learning: Essential

Equations

1. Linear Algebra
Vectors and Matrices
• Dot Product: n
X
a·b= ai b i
i=1

• Matrix Multiplication: X
(AB)ij = Aik Bkj
k

• Transpose:
(AT )ij = Aji

• Inverse:
AA−1 = I

Key Properties
• Determinant: det(A)
• Eigenvalues and Eigenvectors: Av = λv

2. Probability and Statistics


Basic Definitions
• Probability Density Function (PDF): For a Normal Distribution:
1 (x−µ)2
p(x) = √ e− 2σ2
2πσ 2

• Expected Value: Z
E[X] = xp(x)dx
P
(continuous) or i xi p(xi ) (discrete)
• Variance:
Var(X) = E[(X − E[X])2 ]

1
Bayes’ Theorem
P (B|A)P (A)
P (A|B) =
P (B)

KL Divergence
X P (xi )
DKL (P ||Q) = P (xi ) log
i
Q(xi )

3. Calculus
Derivatives
• Gradient (Vector of Partial Derivatives):
 
∂f ∂f
∇f (x) = , ,...
∂x1 ∂x2

• Chain Rule:
∂f ∂f ∂u
= ·
∂x ∂u ∂x

Optimization
w = w − η∇f (w)

4. Linear Regression
• Hypothesis:
ŷ = Xw + b
• Cost Function (Mean Squared Error):
m
1 X
J(w, b) = (ŷi − yi )2
2m i=1

• Normal Equation (Closed Form Solution):


w = (XT X)−1 XT y

5. Logistic Regression
• Hypothesis (Sigmoid Function):
1
ŷ = σ(Xw) =
1 + e−Xw
• Cost Function (Binary Cross-Entropy):
m
1 X
J(w) = − [yi log(ŷi ) + (1 − yi ) log(1 − ŷi )]
m i=1

2
6. Neural Networks
• Forward Propagation:

z = wT x + b, a = σ(z)

• Backpropagation: Gradient of the loss with respect to weights using the chain
rule.

• Loss Function (Cross-Entropy for Classification):


X
L=− yi log(ŷi )
i

7. Principal Component Analysis (PCA)


• Covariance Matrix:
1 T
C= X X
m
• Eigen decomposition for dimensionality reduction:

Xreduced = XW, where W are the top eigenvectors.

8. Support Vector Machines (SVM)


• Optimization Problem:
1
min ||w||2
2
Subject to: yi (wT xi + b) ≥ 1

9. Clustering (e.g., k-Means)


• Centroid Update Equation:
P
x∈Ck x
µk =
|Ck |

• Objective: Minimize within-cluster sum of squares:


XX
||x − µk ||2
k x∈Ck

10. Gradient-Based Learning


• Stochastic Gradient Descent (SGD) Update Rule:

w = w − η∇J(w; x, y)

You might also like