Assignment Problem - Gradient Descent
Assignment Problem - Gradient Descent
in Python
X1 Y
(Feature) (Target)
1 4.8
3 12.4
5 15.5
• The following plot shows these 3 datapoints in Blue circles. Also shown
is the red-line (with squares), which we are claiming is the “best-fit
line”.
• The claim is that this best fit-line will have the minimum error for
prediction (the predicted values are actually the red-squares, hence the
vertical difference is the error in prediction).
• This total difference (error) across all the datapoints is expressed as the
Mean Squared Error Function, which will be minimized using the
Gradient Descent Algorithm, discussed below.
The net Objective is to find the Equation of the Best-Fitting Straight Line
(through these 3 data points, mentioned in the above table, also represented by
the blue circles in the above plot).
1 N 1 N ˆ
MSE =
N i =1
( Errori ) 2
=
N i =1
(Yi − Yi )2 --------------- (3)
Now the problem is to find the “optimal values” of the slope and intercept of
this best-fit line, such that the “Mean Squared Error” (MSE) is minimum. You
can easily see that the yellow-line (a poor-fit line) which has “non-optimal”
values of slope & intercept fits the data very badly (btw the exact equation of
the yellow line is x+6, so slope is 1 and intercept is 6 units)
How will i get the optimal values of the slope and intercept ????
N ˆ
w k +1
0 = w − (Yi − Yi )
k
0
i =1
N ˆ
w k +1
1 = w − [(Yi − Yi ) X 1i ]
k
1
i =1 ---------- (4 & 5)
where w0k & w1k represent the values of the intercept and the slope of the linear-fit
in the kth iteration, whereas w0k +1 & w1k +1 represent the values of the intercept and
the slope of the linear-fit in the (k+1)th iteration (next iteration). w0 & w1 are
also called as model weights or model coefficients.
α represents the Learning Rate.
The derivation of the above equations will be done in the machine learning
lessons.
Gradient Descent Algorithm
1. Initialize the algorithm with random values of α, and weights (w0, w1)
2. Calculate predictions Y^ = w0 + w1 * x as shown in the equation 1
3. Calculate Error terms & MSE Loss Function (L):
N N
Error terms are: (Yˆi − Yi ) and
i =1
[(Yˆ − Y ) X
i =1
i i 1i ] for the datapoints i=1 to
N (here N is equal to 3)
Based on the above-mentioned steps, we can calculate the weights. Let the
learning rate (α) be 0.01 and initialize the weights w0 and w1 as 0.
X1 Y Y^ Y^-Y (Y^-Y)*X1
X1 Y Y^ Y^-Y (Y^-Y)*X1
As we can see, the sum of errors is decreasing as we are updating the weights.
We can continue to update the weights like the above manner until the sum of
errors become minimum (i.e. reaches almost a constant value)