0% found this document useful (0 votes)
33 views40 pages

Python Tut Gradient Descent Algos MLR - Jupyter Notebook

Uploaded by

Anvitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views40 pages

Python Tut Gradient Descent Algos MLR - Jupyter Notebook

Uploaded by

Anvitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Python Tutorial on Implementation of Gradient Desecent Optimization

algorithms from Scratch

Example : Implementation of Gradient Descent Algorithms from scratch in Python

Table of Contents
1. Create Data
2. Solve Direclty
3. Batch Gradient Descent
4. Stochastic Gradient Descent
5. Mini-batch Gradient Descent

In [1]:  1 import numpy as np


2 import matplotlib.pyplot as plt
3 ​
4 %matplotlib inline
1.Create Data
In [15]:  1 # generate random data with some noise
2 ​
3 X = 2 * np.random.rand(100, 1)
4 print(X.shape)
5 # print(X)
6 y = 8 +(3*X) + (np.random.randn(100, 1))
7 print(y.shape)
8 ​
9 ## For printing together input and target
10 X_y=np.c_[X,y]
11 print("Input and target")
12 print(X_y)
(100, 1)
(100, 1)
Input and target
[[ 0.04497682 9.10423587]
[ 0.44369684 9.91643706]
[ 1.14757428 11.57207686]
[ 1.98332104 12.02308413]
[ 0.03038902 9.2510614 ]
[ 1.06406009 10.54856201]
[ 0.76146702 11.78215315]
[ 0.92371199 9.46087616]
[ 1.14317242 11.56766946]
[ 1.55185131 12.12205791]
[ 0.0186962 7.2777673 ]
[ 1.1789661 11.50784111]
[ 0.12408403 7.41553359]
[ 1.19483023 12.57702559]
[ 1.59064174 14.23886903]
[ 1.40627822 12.4646239 ]
[ 0.41236386 9.66891799]
[ 0.01554289 7.50275661]
[ 1.24460193 12.63655415]
[ 0.59811763 11.0568907 ]
[ 0.41969231 9.65690536]
[ 1.36654913 12.13159462]
[ 1.64921642 13.65952935]
[ 0.35949318 6.70617922]
[ 0.02194567 8.71344024]
[ 0.19417357 7.8774343 ]
[ 0.43027749 8.26604118]
[ 0.01570807 7.77699381]
[ 1.8942345 14.27457547]
[ 1.84577681 13.78644959]
[ 1.0891461 11.9357483 ]
[ 0.94948277 10.53574713]
[ 1.8847801 13.73178354]
[ 1.26666735 11.90012728]
[ 0.40107006 7.84937596]
[ 1.44085866 11.71417226]
[ 1.61521082 13.30733268]
[ 1.50202438 11.5954719 ]
[ 0.88315884 11.46431646]
[ 0.47911189 9.47640518]
[ 0.30588616 10.13541128]
[ 1.52129161 12.84197568]
[ 0.25075942 7.88154753]
[ 0.90823295 10.79025615]
[ 1.63317685 12.44083037]
[ 1.68238394 14.15681462]
[ 1.36562465 12.50388206]
[ 0.77130311 12.89670151]
[ 1.10717202 10.36614913]
[ 0.23961462 7.03771564]
[ 0.55449143 8.97155405]
[ 1.96119551 13.11300052]
[ 1.02226458 10.56326872]
[ 0.13509331 9.68292579]
[ 0.27385538 7.45643523]
[ 0.68252181 10.8104117 ]
[ 0.47555176 9.06428252]
[ 1.09838153 11.87228133]
[ 1.08531632 11.92645865]
[ 0.75247426 8.81934867]
[ 1.99022675 15.21941186]
[ 1.99980551 12.91520817]
[ 1.5864137 12.94750455]
[ 1.66776477 13.05471928]
[ 0.39009362 9.5872106 ]
[ 0.06842432 6.52626061]
[ 0.49160231 8.96437506]
[ 1.89122417 13.97621451]
[ 0.29577358 9.39384895]
[ 1.21440789 11.27633599]
[ 1.79309467 11.8779532 ]
[ 1.63039219 12.65629986]
[ 0.81381444 11.28511091]
[ 1.0785014 11.64870513]
[ 0.65317283 10.92945199]
[ 1.39720473 10.49816875]
[ 0.72273399 10.0385813 ]
[ 1.56516876 11.22598596]
[ 1.79435949 13.69041692]
[ 0.92515284 10.69258089]
[ 0.92666572 11.16743518]
[ 1.79482058 14.3119088 ]
[ 0.83466322 10.25302971]
[ 0.89994989 9.90721344]
[ 0.05247682 7.3231602 ]
[ 1.34408815 12.47571037]
[ 1.32256478 11.96633382]
[ 0.9896771 11.92459708]
[ 0.38734407 9.25595385]
[ 0.62419774 9.34182876]
[ 1.03598244 9.13251898]
[ 1.52120598 12.7770783 ]
[ 0.16201475 7.62316748]
[ 1.5294854 12.93170731]
[ 0.77389001 11.13815733]
[ 1.43854668 14.43331059]
[ 0.77970708 9.08175759]
[ 0.42117791 9.99767758]
[ 0.38189651 8.82041598]
[ 0.45969414 11.29653945]]

In [16]:  1 # X=2*np.random.rand(100,1)
2 # print(X)
3 # print(X.shape)
4 # y=8+3*X+np.random.rand(100,1)
5 # print(y.shape)
In [17]:  1 # Scatter plot to see the relation between X and y
2 plt.figure(figsize=(15,10))
3 plt.scatter(X, y, marker='x')
4 plt.xlabel('X')
5 plt.ylabel('y', rotation=0)

Out[17]: Text(0, 0.5, 'y')


2. Solve using Numpy Library
In [18]:  1 # # Understanding numpy functions
2 # import numpy as np
3 # X = np.arange(6)
4 # print(X)
5 # X = X.reshape((6, 1))
6 # print(X)
7 # X1=np.ones_like(X) # Return an array of ones with the same shape and type as a given array.
8 # print(X1)
9 # z=np.c_[X1,X] # Translates slice objects to concatenation along the second axis.
10 # print(z)
In [19]:  1 # Adding bais unit to every vector in X
2 B=np.ones_like(X)
3 X_b=np.c_[B,X]
4 print(X_b)
5 print(X_b.shape)
6 ​
7 #y=y.reshape((6,2))
8 theta=np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
9 print(theta)
[[1. 0.04497682]
[1. 0.44369684]
[1. 1.14757428]
[1. 1.98332104]
[1. 0.03038902]
[1. 1.06406009]
[1. 0.76146702]
[1. 0.92371199]
[1. 1.14317242]
[1. 1.55185131]
[1. 0.0186962 ]
[1. 1.1789661 ]
[1. 0.12408403]
[1. 1.19483023]
[1. 1.59064174]
[1. 1.40627822]
[1. 0.41236386]
[1. 0.01554289]
[1. 1.24460193]
[1. 0.59811763]
[1. 0.41969231]
[1. 1.36654913]
[1. 1.64921642]
[1. 0.35949318]
[1. 0.02194567]
[1. 0.19417357]
[1. 0.43027749]
[1. 0.01570807]
[1. 1.8942345 ]
[1. 1.84577681]
[1. 1.0891461 ]
[1. 0.94948277]
[1. 1.8847801 ]
[1. 1.26666735]
[1. 0.40107006]
[1. 1.44085866]
[1. 1.61521082]
[1. 1.50202438]
[1. 0.88315884]
[1. 0.47911189]
[1. 0.30588616]
[1. 1.52129161]
[1. 0.25075942]
[1. 0.90823295]
[1. 1.63317685]
[1. 1.68238394]
[1. 1.36562465]
[1. 0.77130311]
[1. 1.10717202]
[1. 0.23961462]
[1. 0.55449143]
[1. 1.96119551]
[1. 1.02226458]
[1. 0.13509331]
[1. 0.27385538]
[1. 0.68252181]
[1. 0.47555176]
[1. 1.09838153]
[1. 1.08531632]
[1. 0.75247426]
[1. 1.99022675]
[1. 1.99980551]
[1. 1.5864137 ]
[1. 1.66776477]
[1. 0.39009362]
[1. 0.06842432]
[1. 0.49160231]
[1. 1.89122417]
[1. 0.29577358]
[1. 1.21440789]
[1. 1.79309467]
[1. 1.63039219]
[1. 0.81381444]
[1. 1.0785014 ]
[1. 0.65317283]
[1. 1.39720473]
[1. 0.72273399]
[1. 1.56516876]
[1. 1.79435949]
[1. 0.92515284]
[1. 0.92666572]
[1. 1.79482058]
[1. 0.83466322]
[1. 0.89994989]
[1. 0.05247682]
[1. 1.34408815]
[1. 1.32256478]
[1. 0.9896771 ]
[1. 0.38734407]
[1. 0.62419774]
[1. 1.03598244]
[1. 1.52120598]
[1. 0.16201475]
[1. 1.5294854 ]
[1. 0.77389001]
[1. 1.43854668]
[1. 0.77970708]
[1. 0.42117791]
[1. 0.38189651]
[1. 0.45969414]]
(100, 2)
[[7.87752609]
[3.11459601]]
In [20]:  1 # Adding bais unit to every vector in X
2 B= np.ones_like(X) # Return an array of ones with the same shape and type as a given array.
3 X_b = np.c_[B, X] # Translates slice objects to concatenation along the second axis.
4 # X_b = X
5 print(X_b)
6 theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y) # Compute the (multiplicative) inverse of a matrix.
7 ​
8 # Values are very close to are thetas 8 and 3
9 print(theta)
10 ​
[[1. 0.04497682]
[1. 0.44369684]
[1. 1.14757428]
[1. 1.98332104]
[1. 0.03038902]
[1. 1.06406009]
[1. 0.76146702]
[1. 0.92371199]
[1. 1.14317242]
[1. 1.55185131]
[1. 0.0186962 ]
[1. 1.1789661 ]
[1. 0.12408403]
[1. 1.19483023]
[1. 1.59064174]
[1. 1.40627822]
[1. 0.41236386]
[1. 0.01554289]
[1. 1.24460193]
[1. 0.59811763]
[1. 0.41969231]
[1. 1.36654913]
[1. 1.64921642]
[1. 0.35949318]
[1. 0.02194567]
[1. 0.19417357]
[1. 0.43027749]
[1. 0.01570807]
[1. 1.8942345 ]
[1. 1.84577681]
[1. 1.0891461 ]
[1. 0.94948277]
[1. 1.8847801 ]
[1. 1.26666735]
[1. 0.40107006]
[1. 1.44085866]
[1. 1.61521082]
[1. 1.50202438]
[1. 0.88315884]
[1. 0.47911189]
[1. 0.30588616]
[1. 1.52129161]
[1. 0.25075942]
[1. 0.90823295]
[1. 1.63317685]
[1. 1.68238394]
[1. 1.36562465]
[1. 0.77130311]
[1. 1.10717202]
[1. 0.23961462]
[1. 0.55449143]
[1. 1.96119551]
[1. 1.02226458]
[1. 0.13509331]
[1. 0.27385538]
[1. 0.68252181]
[1. 0.47555176]
[1. 1.09838153]
[1. 1.08531632]
[1. 0.75247426]
[1. 1.99022675]
[1. 1.99980551]
[1. 1.5864137 ]
[1. 1.66776477]
[1. 0.39009362]
[1. 0.06842432]
[1. 0.49160231]
[1. 1.89122417]
[1. 0.29577358]
[1. 1.21440789]
[1. 1.79309467]
[1. 1.63039219]
[1. 0.81381444]
[1. 1.0785014 ]
[1. 0.65317283]
[1. 1.39720473]
[1. 0.72273399]
[1. 1.56516876]
[1. 1.79435949]
[1. 0.92515284]
[1. 0.92666572]
[1. 1.79482058]
[1. 0.83466322]
[1. 0.89994989]
[1. 0.05247682]
[1. 1.34408815]
[1. 1.32256478]
[1. 0.9896771 ]
[1. 0.38734407]
[1. 0.62419774]
[1. 1.03598244]
[1. 1.52120598]
[1. 0.16201475]
[1. 1.5294854 ]
[1. 0.77389001]
[1. 1.43854668]
[1. 0.77970708]
[1. 0.42117791]
[1. 0.38189651]
[1. 0.45969414]]
[[7.87752609]
[3.11459601]]

3. Batch Gradient Decent Algorithm


Below mentioned Cost Function which stands for Mean Squared Error(MSE) is for Linear Regression and Gradient is derived from the it.

Cost
𝑚
𝐽(𝜃) = 1/𝑚 ∑(ℎ(𝜃)(𝑖) − 𝑦(𝑖) )2
𝑖=1
Gradient

∂𝐽(𝜃) = 2/𝑚 𝑚 (ℎ(𝜃(𝑖) ) − 𝑦(𝑖) ). 𝑋𝑗(𝑖)


∂𝜃𝑗 ∑𝑖=1
Below mentioned Cost Function which stands for Mean Squared Error(MSE) is for Linear Regression and Gradient is derived from the it.

Cost
𝑚
𝐽(𝜃) = 1/2𝑚 ∑(ℎ(𝜃)(𝑖) − 𝑦(𝑖) )2
𝑖=1
Gradient

∂𝐽(𝜃) = 1/𝑚 𝑚 (ℎ(𝜃(𝑖) ) − 𝑦(𝑖) ). 𝑋𝑗(𝑖)


∂𝜃𝑗 ∑𝑖=1

Function for computing Cost function


In [21]:  1 def cal_cost(theta, X, y):
2 '''
3 Calculates the cost for given X and y.
4
5
6 Parameters
7 ----------
8 theta: Vector of thetas
9 X: Row of X's
10 y: Actual y's
11
12 Returns
13 -------
14 Calculated cost
15 '''
16 m = len(y)
17 predictions = np.dot(X, theta)
18 cost = (1/ m) * np.sum(np.square(predictions - y))
19
20 return cost
Function for Batch gradient Decent Algorithm
In [22]:  1 def gradient_descent(X, y, theta, learning_rate, iterations):
2 '''
3 Parameters
4 ----------
5 X: Matrix of X with bais units added to every vector of X
6 y: Vector of y
7 theta: Vector initialized randomly
8 learning_rate: learning rate that will be used as step size
9 iterations = Number of iterations
10
11 Returns
12 -------
13 The final theta vector, array of cost and theta history over no of iterations
14 '''
15
16 m = len(y)
17 cost_history = np.zeros(iterations)
18 theta_history = np.zeros((iterations, 2))
19
20 for i in range(iterations):
21 #Forward propagation
22 predictions = np.dot(X, theta)
23 #Compute Cost (Value of loss function)
24 cost_history[i] = cal_cost(theta, X, y)
25 # Backpropogation
26 grad =(2/m) * (X.T.dot((predictions - y)))
27 # Update Parameters
28 theta = theta - (learning_rate * grad)
29 theta_history[i, :] = theta.T
30
31
32 return theta, cost_history, theta_history
Calling function

In [23]:  1 # Using gradient descent to find out the thetas between our X and y relation
2 import time
3 lr = 0.3
4 n_iter = 100
5 ​
6 theta = np.random.randn(2,1)
7 print("Initial value of weights")
8 print(theta)
9 X_b = np.c_[np.ones((len(X),1)),X]
10 ​
11 ## Calling BGD function
12 t10 = time.time()
13 theta,cost_history,theta_history = gradient_descent(X_b, y, theta, lr, n_iter)
14 t11 = time.time()
15 time_BGD = t11 - t10
16 print("Processing Time of MBG Algorithm")
17 print(time_BGD)
18 ​
19 ​
20 # Predicted value by NNs
21 y_predicted= np.dot(X_b, theta)
22 print("Optimal value of weights")
23 print('{:<10}{:.3f}'.format('Theta0:',theta[0][0]))
24 print('{:<10}{:.3f}'.format('Theta1:',theta[1][0]))
25 print("Minimum value of cost function")
26 print('{:<10}{:.3f}'.format('Cost/MSE:',cost_history[-1]))
27 ​
28 # print(y_predicted)
29 ​
30 predicted__actual = np.c_[y_predicted, y]
31 print("Comparing predicted vs actual")
32 print(predicted__actual)
[ 9.25939063 9.91643706]
[11.45175653 11.57207686]
[14.0548556 12.02308413]
[ 7.97206146 9.2510614 ]
[11.19163502 10.54856201]
[10.24914889 11.78215315]
[10.75449303 9.46087616]
[11.43804608 11.56766946]
[12.71095756 12.12205791]
[ 7.93564186 7.2777673 ]
[11.5495326 11.50784111]
[ 8.26389315 7.41553359]
[11.59894458 12.57702559]
[12.83177803 14.23886903]
[12.25754129 12.4646239 ]
[ 9.16179784 9.66891799]
[ 7.92582023 7.50275661]
[11.75396842 12.63655415]
[ 9.74036482 11.0568907 ]
[ 9.18462378 9.65690536]
[12 13379715 12 13159462]
In [24]:  1 # plotting the cost history vs number of iterations
2 ​
3 plt.figure(figsize=(15,5))
4 plt.scatter(range(n_iter), cost_history)
5 plt.xlabel('Number of Iterations')
6 plt.ylabel('Cost', rotation=0)
7 ​

Out[24]: Text(0, 0.5, 'Cost')


In [25]:  1 # plot prediction line
2 plt.figure(figsize=(15,5))
3 plt.scatter(X, y, marker='x')
4 plt.plot(X, y_predicted, 'red')
5 plt.xlabel('X')
6 plt.ylabel('y', rotation=0)

Out[25]: Text(0, 0.5, 'y')


4. Stochastic Gradient Descent (SGD) Algorithm
In [26]:  1 def stochastic_gradient_descent(X, y, theta, learning_rate,iterations):
2 '''
3 Parameters
4 ----------
5 X: Matrix of X with bais units added to every vector of X
6 y: Vector of y
7 theta: Vector initialized randomly
8 learning_rate: learning rate that will be used as step size
9 iterations = Number of iterations
10
11 Returns
12 -------
13 The final theta vector, array of cost and theta history over no of iterations
14 '''
15 m = len(y)
16 theta_history = np.zeros((iterations, 2))
17 cost_history = np.zeros(iterations)
18 # theta_history = []
19
20 for i in range(iterations):
21 cost_per_iteration = .0
22 for j in range(m):
23 # #Step : Shuffle (X,y)
24 X_rand_idx = np.random.randint(m)
25 X_inner = X[X_rand_idx].reshape(1, X.shape[1])
26 y_inner = y[X_rand_idx].reshape(1, 1)
27
28 # Forward Propgation
29 predictions = np.dot(X_inner, theta)
30
31 # Computing cost value
32 cost_per_iteration += cal_cost(theta, X_inner, y_inner)
33
34 # Backpropagation
35 grad= (2/m) * (X_inner.T.dot((predictions - y_inner)))
36
37 # Weight updation
38 theta = theta - (learning_rate * grad)
39
40 cost_history[i] = cost_per_iteration
41
42
43 return theta, cost_history, theta_history
In [31]:  1 # Using stochastic gradient descent to find out the thetas between our X and y relation
2 import time
3 lr = 0.3
4 n_iter = 100
5 theta = np.random.randn(2,1)
6 print("Initial value of weights")
7 print(theta)
8 ​
9 X_b = np.c_[np.ones((len(X),1)),X]
10 ​
11 ## calling SGD
12 t20 = time.time()
13 theta,cost_history,theta_history = stochastic_gradient_descent(X_b, y, theta, lr, n_iter)
14 t21 = time.time()
15 time_SGD = t21 - t20
16 print("Processing Time of SGD Algorithm")
17 print(time_SGD)
18 # theta,cost_history,theta_history = stochastic_gradient_descent(X_b, y, theta, lr, n_iter)
19 print(theta)
20 ​
21 ​
22 # Predicted value by NNs
23 y_predicted= np.dot(X_b, theta)
24 ​
25 print("Optimal value of weights")
26 print('{:<10}{:.3f}'.format('Theta0:',theta[0][0]))
27 print('{:<10}{:.3f}'.format('Theta1:',theta[1][0]))
28 print("Minimum value of cost function")
29 print('{:<10}{:.3f}'.format('Cost/MSE:',cost_history[-1]))
30 ​
31 ​
32 # print(y_predicted)
33 predicted__actual = np.c_[y_predicted, y]
34 print("Comparing predicted vs actual")
35 print(predicted__actual)
36 ​
37 ​
[ 8.52699622 7.8774343 ]
[ 9.26269015 8.26604118]
[ 7.9709022 7.77699381]
[13.82434371 14.27457547]
[13.67335075 13.78644959]
[11.31570832 11.9357483 ]
[10.88052087 10.53574713]
[13.79488402 13.73178354]
[11.86886012 11.90012728]
[ 9.17168056 7.84937596]
[12.41163588 11.71417226]
[12.95491287 13.30733268]
[12.60222674 11.5954719 ]
[10.67385716 11.46431646]
[ 9.41485695 9.47640518]
[ 8.87508987 10.13541128]
[12.66226294 12.84197568]
[ 8.70331633 7.88154753]
[10.75198745 10.79025615]
[13.01089457 12.44083037]
[13 16422264 14 15681462]
In [32]:  1 # def stochastic_gradient_descent(X, y, theta, learning_rate,iterations):
2 # '''
3 # Parameters
4 # ----------
5 # X: Matrix of X with bais units added to every vector of X
6 # y: Vector of y
7 # theta: Vector initialized randomly
8 # learning_rate: learning rate that will be used as step size
9 # iterations = Number of iterations
10
11 # Returns
12 # -------
13 # The final theta vector, array of cost and theta history over no of iterations
14 # '''
15
16 # m = len(y)
17 # theta_history = np.zeros((iterations, 2))
18 # cost_history = np.zeros(iterations)
19 # theta_history = []
20
21 # for i in range(iterations):
22 # cost_per_iteration = .0
23 # for j in range(m):
24 # # #Step : Shuffle (X,y)
25 # X_rand_idx = np.random.randint(m)
26 # X_inner = X[X_rand_idx].reshape(1, X.shape[1])
27 # y_inner = y[X_rand_idx].reshape(1, 1)
28
29 # predictions = np.dot(X_inner, theta)
30
31 # theta = theta - (2/m) * learning_rate * (X_inner.T.dot((predictions - y_inner)))
32 # cost_per_iteration += cal_cost(theta, X_inner, y_inner)
33 # cost_history[i] = cost_per_iteration
34
35 # return theta, cost_history, theta_history
In [20]:  1 # plotting the cost history vs number of iterations
2 ​
3 plt.figure(figsize=(15,5))
4 plt.scatter(range(n_iter), cost_history)
5 plt.xlabel('Number of Iterations')
6 plt.ylabel('Cost', rotation=0)
7 ​

Out[20]: Text(0, 0.5, 'Cost')


In [20]:  1 # plot prediction line
2 plt.figure(figsize=(15,5))
3 plt.scatter(X, y, marker='x')
4 plt.plot(X, np.dot(X_b, theta), 'red')
5 plt.xlabel('X')
6 plt.ylabel('y', rotation=0)

Out[20]: Text(0, 0.5, 'y')


5. Mini-batch Gradient Descent (MBGD) Algorithm
In [27]:  1 def mini_batch_gradient_descent(X, y, theta, learning_rate, iterations):
2 '''
3 Parameters
4 ----------
5 X: Matrix of X with bais units added to every vector of X
6 y: Vector of y
7 theta: Vector initialized randomly
8 learning_rate: learning rate that will be used as step size
9 iterations = Number of iterations
10
11 Returns
12 -------
13 The final theta vector, array of cost and theta history over no of iterations
14 '''
15
16 m = len(y)
17 cost_history = np.zeros(iterations)
18 # print(cost_history)
19 theta_history = np.zeros((iterations, 2))
20 # print(theta_history)
21 # Batch size
22 batch_size=10
23 for i in range(iterations):
24 cost_per_iteration = .0
25 # #Step 1: Shuffle (X,y)
26 rand_indicies = np.random.permutation(m)
27 X = X[rand_indicies]
28 y = y[rand_indicies]
29
30 for j in range(0, m, batch_size):
31 X_inner = X[i: i+batch_size]
32 y_inner = y[i: i+batch_size]
33 # forward propagation
34 predictions = np.dot(X_inner, theta)
35 # # weight updataion equation
36 # theta = theta - (2/m) * learning_rate * (X_inner.T.dot((predictions - y_inner)))
37 # cost_per_iteration += cal_cost(theta, X_inner, y_inner)
38 # Computing cost value
39 cost_per_iteration += cal_cost(theta, X_inner, y_inner)
40 # Backpropagation
41 grad= (2/m) * (X_inner.T.dot((predictions - y_inner)))
42 # Weight updation
43 theta = theta - (learning_rate * grad)
44 cost_history[i] = cost_per_iteration
45
46 return theta, cost_history, theta_history
In [28]:  1 # Using mini batch gradient descent to find out the thetas between our X and y relation
2 lr = 0.3
3 n_iter = 100
4 theta = np.random.randn(2,1)
5 print("Initial value of weights")
6 print(theta)
7 ​
8 # print(X_b)
9 X_b = np.c_[np.ones((len(X),1)),X]
10 # print(X_b)
11 ​
12 ## calling MBGD
13 t30 = time.time()
14 theta,cost_history,theta_history = mini_batch_gradient_descent(X_b, y, theta, lr, n_iter)
15 t31 = time.time()
16 time_MBGD = t31 - t30
17 print("Processing Time of MBGD Algorithm")
18 print(time_MBGD)
19 # theta,cost_history,theta_history = mini_gradient_descent(X_b, y, theta, lr, n_iter)
20 # print(theta)
21 ​
22 # Predicted value by NNs
23 y_predicted= np.dot(X_b, theta)
24 ​
25 print("Optimal value of weights")
26 print('{:<10}{:.3f}'.format('Theta0:',theta[0][0]))
27 print('{:<10}{:.3f}'.format('Theta1:',theta[1][0]))
28 print("Minimum value of cost function")
29 print('{:<10}{:.3f}'.format('Cost/MSE:',cost_history[-1]))
30 ​
31 # print(y_predicted)
32 predicted__actual = np.c_[y_predicted, y]
33 print("Comparing predicted vs actual")
34 print(predicted__actual)
[12.12343891 12.13159462]
[13.04414139 13.65952935]
[ 8.84326137 6.70617922]
[ 7.74380331 8.71344024]
[ 8.30478315 7.8774343 ]
[ 9.07381964 8.26604118]
[ 7.72348624 7.77699381]
[13.84221304 14.27457547]
[13.68437689 13.78644959]
[11.21988316 11.9357483 ]
[10.76497248 10.53574713]
[13.8114182 13.73178354]
[11.79810449 11.90012728]
[ 8.97868538 7.84937596]
[12.36547953 11.71417226]
[12.93337851 13.30733268]
[12.5647082 11.5954719 ]
[10.5489425 11.46431646]
[ 9.23288283 9.47640518]
[ 8.66865284 10.13541128]
In [29]:  1 # plotting the cost history vs number of iterations
2 ​
3 plt.figure(figsize=(15,5))
4 plt.scatter(range(n_iter), cost_history)
5 plt.xlabel('Number of Iterations')
6 plt.ylabel('Cost', rotation=0)
7 ​

Out[29]: Text(0, 0.5, 'Cost')


In [30]:  1 # plot prediction line
2 ​
3 plt.figure(figsize=(15,5))
4 plt.scatter(X, y, marker='x')
5 plt.plot(X, np.dot(X_b, theta), 'red')
6 plt.xlabel('X')
7 plt.ylabel('y', rotation=0)

Out[30]: Text(0, 0.5, 'y')

Comparison of Processing Speed of above three Grandient Decent


Optimization Algorithms Implementation
In [33]:  1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 # plt.style.use('ggplot')
4 ​
5 x = ['Method-1', 'Method-2', 'Method-3']
6 Processing_Times = [time_BGD, time_SGD, time_MBGD]
7
8 ​
9 x_pos = [i for i, _ in enumerate(x)]
10 ​
11 plt.bar(x_pos, Processing_Times, color='green')
12 plt.xlabel("<-------------Method-----------> ")
13 plt.ylabel("<----------Time-------------> ")
14 plt.title(" Time Vs Method ")
15 ​
16 plt.xticks(x_pos, x)
17 ​
18 plt.show()
In [ ]:  1 ​

You might also like