0% found this document useful (0 votes)
26 views2 pages

Gradient Decent Calculation

The document describes the process of gradient descent for linear regression. It defines the variables used in gradient descent including X (training examples), y (labels), θ (parameters), α (learning rate), m (number of examples). It shows that θ is updated by subtracting a term containing the product of the learning rate, the transpose of X, and the difference between the predicted and actual values (errors). This minimizes the errors in the predictions to optimize the model.

Uploaded by

Arindam Sen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views2 pages

Gradient Decent Calculation

The document describes the process of gradient descent for linear regression. It defines the variables used in gradient descent including X (training examples), y (labels), θ (parameters), α (learning rate), m (number of examples). It shows that θ is updated by subtracting a term containing the product of the learning rate, the transpose of X, and the difference between the predicted and actual values (errors). This minimizes the errors in the predictions to optimize the model.

Uploaded by

Arindam Sen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

theta = theta - (alpha/m) * (X' * (X * theta - y))

Assume that the following values of X, y and θ are given:

 m = number of training examples


 n = number of features + 1

Here

 m = 5 (training examples)
 n = 4 (features+1)
 X = m x n matrix
 y = m x 1 vector matrix
 θ = n x 1 vector matrix
 xi is the ith training example
 xj is the jth feature in a given training example

Further,

 h(x) = ([X] * [θ]) (m x 1 matrix of predicted values for our training set)
 h(x)-y = ([X] * [θ] - [y]) (m x 1 matrix of Errors in our predictions)

whole objective of machine learning is to minimize Errors in predictions. Based on the above
corollary, our Errors matrix is m x 1 vector matrix as follows:
To calculate new value of θj, we have to get a summation of all errors (m rows) multiplied
by jth feature value of the training set X. That is, take all the values in E, individually multiply
them with jth feature of the corresponding training example, and add them all together. This will
help us in getting the new (and hopefully better) value of θj. Repeat this process for all j or the
number of features. In matrix form, this can be written as:

This can be simplified as:

 [E]' x [X] will give us a row vector matrix, since E' is 1 x m matrix and X is m x n
matrix. But we are interested in getting a column matrix, hence we transpose the resultant
matrix.

More succinctly, it can be written as:

Since (A * B)' = (B' * A'), and A'' = A, we can also write the above as

You might also like