Chapter 4 - Anatomy of A Learning Algorithms
Chapter 4 - Anatomy of A Learning Algorithms
2. An optimization criterion
3. An optimization routine
Gradient Descent
Gradient descent is an iterative optimization algorithm for finding the minimum of a
function. To find a local minimum of a function using gradient descent, one starts at some
random point and takes steps proportional to the negative of the gradient (or approximate
gradient) of the function at the current point.
Gradient descent can be used to find optimal parameters for linear and logistic regression,
SVM and also neural networks which we consider later. For many models, such as logistic
regression or SVM, the optimization criterion is convex. Convex functions have only one
minimum, which is global. Optimization criteria for neural networks are not convex, but in
practice even finding a local minimum suffices.
Working of Gradient Descent
Linear regression model looks like : . where is called weights and is called
f (x) = wx + b w b
bias. in order to get the optimal model we have to find the optimal values for both and . w b
we look for such values and that minimize the mean square error:
w b
N
1 2
l = ∑ (y i − (wx i + b)) .
N
i=1
Gradient Descent starts with calculating the partial derivate for every parameter.
N
∂l 1
= ∑ −2x i (y i − (w x i + b));
∂w N
i=1
N
∂l 1
= ∑ −2(y i − (wx i + b))
∂b N
i=1
To find the partial derivate of the term with respect to we applied the
(y i − (wx + b))
2
w
partial derivate of with respect to we have to first find the partial derivate of with
1
f w f
respect to which is equal to
f2 . and then we have to multiply it by the
2(y i − (wx + b))
N
∂l 1
= ∑ −2x i (y i − (wx i + b))
∂w N
i=1
We initialize and
w0 = 0 and then iterate through out training examples. each
b0 = 0
and using our partial derivates, The learning rate controls the size of an update :
b α
Where and denote the values of and after using the example
wi bi w b (x i , y i ) for the update.
One pass through all training examples is called an epoch.
How machine Learning engineers work
Machine learning engineers use libraries instead of implementing learning algorithm
themselves. The most frequently used open-source library is scikit-learn:
def train(x, y):
from sklearn.linear_model import LinearRegression
modl = LinearRegression().fit(x, y)
return model
model = train(x, y)
x_new = 23.0
y_new = model.predict(x_new)
print(y_new)