Machine Learning Notes ?
Machine Learning Notes ?
(1, 2)
𝟐 − (−𝟐) 𝟒
𝒎= =
𝟏 − (−𝟐) 𝟑 (2, 1)
𝐕𝐞𝐫𝐭𝐢𝐜𝐚𝐥 𝐜𝐡𝐚𝐧𝐠𝐞
𝐒𝐥𝐨𝐩𝐞 =
𝐇𝐨𝐫𝐢𝐳𝐨𝐧𝐭𝐚𝐥 𝐜𝐡𝐚𝐧𝐠𝐞 2
(–4, –1)
𝟏 − (−𝟏) 𝟐 𝟏
𝒎= = =
𝟐 − (−𝟒) 𝟔 𝟑
6
(–2, –2)
Positive slope Negative slope Zero slope Undefined slope
y = mx + b y = mx + b y=b x=a
m>0 m<0
L I N E A R E Q U AT I O N S
• Slope-intercept form: y = mx + b
y = –2x – 1 y = ½x – 1
EXAMPLE
Passes through (–2, 6) and parallel to Passes through (–2, 6) and perpendicular to
2 5 2 5
𝑦= 𝑥− 𝑦= 𝑥−
3 3 3 3
2
𝑚= 3
3 𝑚=−
2 2
𝑦= 𝑥+𝑏 3
3 𝑦=− 𝑥+𝑏
2 2
−2 + 𝑏 = 6 3
3 − −2 + 𝑏 = 6
4 2
− +𝑏 =6 3+𝑏=6
3
4 18 22 𝑏 =6−3=3
𝑏= + = 𝟑
3 3 3
𝒚=− 𝒙+𝟑
𝟐 𝟐𝟐 𝟐
𝒚= 𝒙+
𝟑 𝟑
B R E A K - E V E N A N A LY S I S
• Linear cost function, C(x) = mx + b m is the marginal cost, b is the fixed
cost, x is the number of items produced
• Break-even point: The point where R(x) = C(x) Occurs where the two lines intersect
EXAMPLE
The cost to produce x widgets is given by C(x) = 105x + 6000 and each widget sells for
$250. Determine the break-even quantity.
Solution:
R(41) = 250(41) = 10,250 R(42) = 250(42) = 10,500
R(x) = 250x
and and
250x = 105x + 6000
145x = 6000 C(41) = 105(41) + 6000 = 10,305 C(42) = 105(42) + 6000 = 10,410
x ~ 41.38
The breakeven quantity is 42 widgets.
Note: Selling 41 widgets is not enough.
LEAST SQUARES LINE
Minimize the sum of the squares of the vertical distances from the data points to the line
𝑦 = 𝑚𝑥 + 𝑏
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑚=
𝑛 σ 𝑥2 − σ 𝑥 2
and
σ𝑦 − 𝑚 σ𝑥
𝑏=
𝑛
S C AT T E R P LOT
Income from side business Let 𝑥 represent the number of years since 1980 and
𝑦 represent the income in thousands of dollars
Year Income
1980 8,414
1985 9,124
1990 10,806
1995 12,321
2000 15,638
2005 18,242
2010 24,792
2015 25,436
L E A S T S Q U A R E S C A LC U L AT I O N S
Least Squares Calculations 𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑚=
𝑛 σ 𝑥2 − σ 𝑥 2
x y xy x2 y2
0 8.414 0 0 70.795396 8 2741.325 − 140 124.773
=
8 3500 − 140 2
5 9.124 45.62 25 83.247376
10 10.806 108.06 100 116.769636 = 0.5312
15 12.321 184.815 225 151.807041
σ𝑦 − 𝑚 σ𝑥
20 15.638 312.76 400 244.547044 𝑏=
𝑛
25 18.242 456.05 625 332.770564
124.773 − 0.5312 140
30 24.792 743.76 900 614.643264 =
8
35 25.436 890.26 1225 646.990096
= 6.3
140 124.773 2741.325 3500 2261.57042
𝒚 = 𝟎. 𝟓𝟑𝟏𝟐𝒙 + 𝟔. 𝟑
GRAPH OF LEAST SQUARES LINE
LEAST SQUARES LINE PREDICTION
Use the least squares line 𝑦 = 0.5312𝑥 + 6.3 to predict income in 2025
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 ∙ 𝑛 σ 𝑦2 − σ 𝑦 2
= 𝟎. 𝟗𝟔𝟗𝟏
PYTHON
AV E R AG E R AT E O F C H A N G E
The average rate of change of 𝑓(𝑥) with respect to 𝑥 as 𝑥 changes from 𝑎 to 𝑏 is
𝑓 𝑏 − 𝑓(𝑎)
𝑏−𝑎
Based on population projections for 2000 to 2050, the projected Hispanic population (in millions) for a certain country can be modeled by the exponential function
𝑡
𝐻 𝑡 = 37.791 1.021
where 𝑡 = 0 corresponds to 2000 and 0 ≤ 𝑡 ≤ 50. Use 𝐻 to estimate the average rate of change in the Hispanic population from 2000 to 2010.
How do we find the exact velocity of the car at say, 𝑡 = 10? Velocity represents both how 𝑠 10 + ℎ − 𝑠(10) 3 10 + ℎ 2 − 3 10 2
fast something is moving and =
Interval Average velocity ℎ ℎ
its direction, so velocity can
be negative.
3 100 + 20ℎ + ℎ2 − 300
=
ℎ
𝑓 𝑎 + ℎ − 𝑓(𝑎)
lim
ℎ→0 ℎ Difference Quotient
Alternate Form
𝑓 𝑏 − 𝑓(𝑎)
lim
𝑏→𝑎 𝑏−𝑎
2 2 2
𝑃 4 − 𝑃(2) 2 4 −5 4 +6 − 2 2 −5 2 +6 𝑃 2 + ℎ − 𝑃(2) 2 2+ℎ −5 2+ℎ +6 −4
= lim = lim
4−2 2 ℎ→0 ℎ ℎ→0 ℎ
18 − 4
= =7 8 + 8ℎ + 2ℎ2 − 10 − 5ℎ + 6 − 4
2 = lim
ℎ→0 ℎ
The average rate of change of profit
from 𝑥 = 2 to 𝑥 = 4 is $700 per item 2ℎ2 + 3ℎ
= lim
ℎ→0 ℎ
2 2
𝑃 3 − 𝑃(2) 2 3 − 5(3) + 6 − 2 2 −5 2 +6
=
3−2 1 = lim 2ℎ + 3 = 3
ℎ→0
= 9−4=5
The instantaneous rate of change of profit with respect to the
The average rate of change of profit number of items produced when 𝑥 = 2 is $300 per item
from 𝑥 = 2 to 𝑥 = 3 is $500 per item
S E C A N T A N D TA N G E N T L I N E S
𝑓 𝑎 + ℎ − 𝑓(𝑎)
ℎ
Slope of secant line = average rate of change
𝑓 𝑎 + ℎ − 𝑓(𝑎)
lim
ℎ→0 ℎ
𝑓 𝑥 + ℎ − 𝑓(𝑥)
𝑓 ′ 𝑥 = lim
ℎ→0 ℎ
The function 𝑓′(𝑥) represents the instantaneous rate of change of 𝑦 = 𝑓(𝑥) with respect to 𝑥
The function 𝑓′(𝑥) represents the slope of the graph at any point 𝑥
The K-means algorithm is a popular unsupervised machine learning method used for
clustering data into K distinct groups based on feature similarity. Mathematically, it
partitions a set of n data points in a d-dimensional space into K clusters
{C1 , C2 , … , CK } such that the within-cluster sum of squares (WCSS) is minimized.
1. Mathematical Formulation
1.1. Data Representation
Let X = {x1 , x2 , … , xn } be a set of n data points, where each xi ∈ Rd is a d-
dimensional vector.
K
WCSS = ∑ ∑ ∥xi − μk ∥2
k=1 xi ∈Ck
Where:
{Ck },{μk }
k=1 xi ∈Ck
Subject to:
Ck ⊆ X for all k .
K
⋃k=1 Ck = X .
k′ .
Ck ∩ Ck′ = ∅ for all k =
2. Algorithm Steps
The K-means algorithm iteratively optimizes the objective function through two main
steps: Assignment and Update.
2.1. Initialization
Random Initialization: Select K distinct data points randomly from X as the
(0) (0) (0)
initial centroids {μ1 , μ2 , … , μK }.
1
∑ xi
(t+1)
μ k = (t)
for each k = 1, 2, … , K
∣C k ∣ (t)
xi ∈C k
3. Mathematical Properties
3.1. Convergence
K-means is guaranteed to converge to a local minimum of the WCSS objective
function. However, it may not find the global minimum due to its dependence on the
initial centroid positions.
3.3. Optimality
K-means solves the clustering problem via Lloyd's algorithm, which is an instance of
the Expectation-Maximization (EM) algorithm for Gaussian mixtures with equal
spherical covariance and equal priors. However, K-means assumes clusters are convex
and isotropic, which may not hold in all datasets.
5. Example
Consider a simple 2-dimensional dataset with n = 4 points:
Let K = 2.
Initialization:
(0) (0)
Suppose we randomly choose μ1 = (1, 2) and μ2 = (10, 2).
Assignment:
(0)
x1 and x2 are closer to μ1 .
(0)
x3 is closer to μ1 .
(0)
x4 is assigned to μ2 .
Update:
(1) 1
New μ1 = 3
((1, 2) + (1, 4) + (1, 0)) = (1, 2)
(1)
New μ2 = (10, 2) (unchanged)
Convergence:
Since the centroids did not change, the algorithm converges with the final
clusters:
C1 = {x1 , x2 , x3 }
C2 = {x4 }
6. Limitations
Choosing K : Determining the optimal number of clusters K is non-trivial and
often requires methods like the Elbow Method or Silhouette Analysis.
7. Conclusion
Mathematically, K-means is an iterative optimization algorithm aimed at partitioning
data into K clusters by minimizing the within-cluster variance. Its simplicity and
efficiency make it a widely used clustering technique, though it comes with
assumptions and limitations that must be considered in practical applications.