ML Lecture 2 2023
ML Lecture 2 2023
For example, in this simple bouncing ball game, whenever the user presses the right
or left arrow keys, respectively the paddle moves some pixels to the right or left.
The user uses the paddle to catch the ball so that it does not hit the bottom line.
Every time the ball hits the paddle, it will bounce off the paddle. If the ball hits the
bottom of the screen then the player loses their life.
In the traditional programming, rules and data go in answers come out. Rules are
expressed by coding and data can come from a variety of the sensors or past
experiences, experiments or historical data. Machine learning rearranges this
process by inputting data and answers first and then gets rules out.
https://www.chegg.com/homework-help/questions-and-answers/python3-assignment-start-solution-bouncing-ball-lab-implement-simple-pong-game-interface-s-q15926071
Introduction
to Numerical
Prediction
Machine Learning
X1 X2 X3 Y
4 -2 6 8
2 8 2 4
6 10 3 0
Solve Using Right Division
Use matrix operations to solve the following system of linear equations.
𝑋𝐴 = 𝐵
𝑋 𝐴 = 𝐵
In this equation X COLUMN WISE and B is ROW WISE vector. This equation can be
solved by multiplying, on the right, both sides by the inverse of A :
import numpy as np
𝑋 𝐴 = 𝐵 A = np.array([[4,2,6],
[-2,8,10],
[6,2,3]])
𝑋 ∙ 𝐴 𝐴−1 = 𝐵 ∙ 𝐴−1 B=np.array([[8,4,0]])
print('shape of A:',A.shape)
𝑋 ∙ 𝐼 = 𝐵 ∙ 𝐴−1 print('shape of B:',B.shape)
coefficients=np.dot(B,np.linalg.inv(A))
print(coefficients)
𝑋 = 𝐵 ∙ 𝐴−1
[[-1.80487805 0.29268293 2.63414634]]
Solve Using Left Division
In this equation X ROW WISE and B is COLUMN WISE. This equation can be solved
by multiplying, on the left, both sides by the inverse of A :
Solution with Python
import numpy as np
A = np.array([[4,-2,6],
[2,8,2],
[6,10,3]])
B=np.array([[8],[4],[0]])
coefficients=np.linalg.inv(A).dot(B)
print(coefficients)
>>> print(coefficients)
[[-1.80487805]
[ 0.29268293]
[ 2.63414634]]
Use matrix operations to solve the following system of linear equations.
𝑋𝐴 = 𝐵
X=inv(A)*B X=B*inv(A)
ARRAY (and Matrix) DIVISION
Identity matrix:
The identity matrix is a square matrix in which the main diagonal elements are 1s
and the rest of the elements are 0s.
When the identity matrix multiplies another matrix (or vector), that matrix (or
vector) is unchanged.
The inverse of a matrix A is typically written as A-1 .The matrix A-1 is the inverse of
the matrix A if, when the two matrices are multiplied, the product is the identity
matrix. Both matrices must be square, and the multiplication order can be A-1 A
or A A-1.
A-1 x A = A x A-1 = I
In Python the inverse of a matrix can be obtained either by using NumPy library function
np.linalg.inv(A).
Examples with Python
If A is
2 1 4
4 1 8
import numpy as np
2 -1 3 A = np.array([[2,1,4],
[4,1,8],
inv(A)is [2,-1,3]])
B=np.linalg.inv(A)
5.5000 -3.5000 2.0000 print('Matrix A:\n',A)
2.0000 -1.0000 0 print('Inverse A:\n',B)
-3.0000 2.0000 -1.0000 I=np.dot(A,B)
print('Idendity Matrix of A:\n',I)
Idendity matrix of A is
I=Axinv(A)
1 0 0
0 1 0
0 0 1
Simple Linear Regression
Linear Regression is modelling a straight-line equation using
relationship between a scalar response (or dependent variable
or y) and one or more explanatory variables(or independent
variables or x). This is a linear relationship between x and y.
y = mx + c
where y is the dependent variable, x is the independent
variable, m is the slope of the line and c is y-intercept( value
of y when x=0). m is also called gradient, because it is the ratio
of amount of the changes in x and y.
𝑛
1 errors = Y - predictions
𝑒𝑟𝑟𝑜𝑟 𝑝 = (𝑡𝑖 − 𝑝𝑖 )2
𝑛 cost = (errors**2).sum()
𝑖
𝜕e2 𝑘 𝜕e2 𝑘
= 2𝑒 𝑘 = 2𝑒 𝑘
𝜕𝑒 𝑘 𝜕𝑒 𝑘
Target(actual, real) output, y value from the dataset Predicted (calculated) output
And we can update our coefficients:
𝑎 𝑛𝑒𝑤 = 𝑎 𝑜𝑙𝑑 + ∆𝑎
𝜕𝑒
∆𝑎 = 𝛼 = 𝛼 2 𝑒 −𝑥 = −2𝛼 (𝑦 − (𝑎𝑥 + 𝑏))(𝑥)
𝜕𝑎
𝑖 𝑖
𝑦 is target(actual, real) output, 𝑦 value is from the dataset 𝑎𝑥 + 𝑏 is predicted (calculated) output
import numpy as np
X = np.array([1, 2, 3, 4, 5])
Y = np.array([3, 5, 7, 9, 11])
class myLinearRegression(object):
def __init__(self, lrate = 0.01, niter = 10):
self.lrate = lrate
self.niter = niter
# Errors
self.errors = []
# Cost function
self.cost = []
for i in range(self.niter):
predicted = self.net_input(X)
0.6375842816075681
errors = y - predicted 2.0948700956551938
self.coefficient[1:] += self.lrate * X.T.dot(errors) [9.01706466]
self.coefficient[0] += self.lrate * errors.sum()
cost = (errors**2).sum() / 2.0
Press any key to continue . . .
self.cost.append(cost)
return self
import numpy as np
import pandas as pd
y = np.array([3, 5, 7, 11])
X = np.array([[1],[2],[3],[5]])
y_pred = model.predict(x)
print('predicted response:', y_pred)
y_pred = model.predict([[11]])
print('predicted response:', y_pred)
y_pred = model.predict([[11],[15]])
print('predicted response:', y_pred)
# Making prediction
xnew=xnew=np.array([11])
xnew = xnew.reshape(-1, 1)
y_pred = model.predict(xnew)
print('predicted response:', y_pred)
xnew=np.array([11, 15])
xnew.shape
xnew = xnew.reshape(-1, 1)
xnew.shape
y_pred = model.predict(xnew)
print('predicted response:', y_pred)
3
# Making prediction
xnew=xnew=np.array([11])
xnew = xnew.reshape(-1, 1)
y_pred = model.predict(xnew)
print('predicted response:', y_pred)
xnew=np.array([11, 15])
xnew.shape
xnew = xnew.reshape(-1, 1)
xnew.shape
y_pred = model.predict(xnew)
print('predicted response:', y_pred)
print('slope:', model.coef_)
print('intercept:', model.intercept_)
print(type(x)) x1 x2 y
print(np.shape(x))
print(x) 3 1 8
print(y)
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
# load dataset as an panda data frame object
df = pandas.read_csv('fishData.csv')
x=df['Width'].values
1.9992 3.528
print(type(x))
print(x) 2.432 3.824
print(x.shape)
2.6316 4.5924
# Modify the input data shape by "reshaping" input data
x = x.reshape(-1, 1)
print(x.shape) # (5, 1)
… …
print(x)
1.4 ?
y=df['Height'].values
print(type(y))
print(y)
print(y.shape)
X1 X2 X3 X4 X5 Y
… … … … … …
41 44 46 12 7.55 ?
Linear regression implementation in sciLearn
# Multiple Linear Regression With scikit-learn import matplotlib.pyplot as plt
import numpy as np df.plot()
df.hist()
import pandas as pd plt.show()
from sklearn.linear_model import LinearRegression plt.scatter(df["Height"],df["Width"])
# load dataset as an panda data frame object
df = pandas.read_csv('fishDataMLR.csv') # Examine the data
# Explore the data print(type(df))
print(df)
df.head() print(df.shape)
df.dtypes x = df[["Length1", "Length2","Length3","Height","Width"]]
len(df.columns) print(type(x))
df.describe() print(x)
# Display the number of null data observations print(x.shape)
df.isnull().values.sum()
# Specify target and input features
target = df.iloc[:, 5].name y=df['Weight'].values
print(type(y))
features = df.iloc[:, 0:5].columns.tolist() print(y)
features print(y.shape)
y=df['Weight'].values
print(type(y)) # numpy.ndarray Set the output variable
print(y)
print(y.shape)
Implement the linear regression model
# Implement the model
model = LinearRegression()
# Train the model Fit (train) the model
model.fit(x,y)
# Evaluate the quality of the model
r_sq = model.score(x,y) Evaluate the quality of the trained model
print('score of r square: ', r_sq)
# Display the learned parameters of the model
coeff_df = pd.DataFrame(model.coef_, x.columns,
columns=['Coefficient']) Display the learned coefficients of the
coeff_df
print('intercept:', model.intercept_) trained model
# Use the trained (fitted) model for prediction
xnew = np.array([[41,44,46,12,7.55]])
y_pred = model.predict(xnew) Make a prediction using the trained model
print('predicted response:', y_pred)
import pandas
df = pandas.read_csv('fishDataMLR.csv')
Length1
Length2
Length3 Weight
model
Height
Width
Mathematics of Gradient Descent – Artificial Intelligence & Deep Learning
𝑦 = 𝑚𝑥 + 𝑏
Simple regression
Simple linear regression uses traditional slope-intercept form, where m and b are
the variables our algorithm will try to “learn” to produce the most accurate
predictions. x represents our input data and y represents our prediction.
https://www.youtube.com/watch?v=jc2IthslyzM
The goal of any Machine Learning Algorithm is to minimize the Cost Function.
Error = 𝑦 − 𝑦ො
Cost function
Let’s use MSE (L2) as our cost function. MSE measures the average squared
difference between an observation’s actual and predicted values. The output is a
single number representing the cost, or score, associated with our current set of
weights. Our goal is to minimize MSE to improve the accuracy of our model.
Given our simple linear equation y=mx+b, we can calculate MSE as:
Error = 𝑦 − 𝑦ො
To minimize MSE we use Gradient Descent to calculate the gradient of our cost
function.
There are two parameters (coefficients) in our cost function we can control: weight
m and bias b. Since we need to consider the impact each one has on the final
prediction, we use partial derivatives. To find the partial derivatives, we use the
Chain rule. We need the chain rule because (y−(mx+b))2 is really 2 nested functions:
the inner function y−(mx+b) and the outer function (error)2
What is Gradient Descent?
error = 𝑦 − 𝑦ො
If we look carefully, our Cost function is of the
form y = x². In a Cartesian coordinate system,
this is an equation for a parabola and can be
graphically represented as :
error = 𝑦 − 𝑦ො
error = 𝑦 − 𝑦ො 𝑒𝑟𝑟𝑜𝑟 = 𝑢 ∴ 𝑢2
𝜕𝐿 𝜕𝐿 𝜕𝑢
𝐿(𝑚,𝑛) = 𝑢2 = 𝑒𝑟𝑟𝑜𝑟 2 = 𝜕𝑢 . 𝜕𝑚 = 2 𝑒𝑟𝑟𝑜𝑟 (−𝑥)
𝜕𝑚
𝜕𝐿 𝜕(𝑢)2 𝜕𝑢
=
𝜕(𝑦−(𝑚𝑥+𝑏)) 𝜕(𝑦−𝑚𝑥−𝑏))
= =
𝜕(−𝑚𝑥+𝑦−𝑏))
= −x
= = 2 𝑒𝑟𝑟𝑜𝑟 𝜕𝑚 𝜕𝑚 𝜕𝑚 𝜕𝑚
𝜕𝑢 𝜕𝑢
𝜕𝐿 𝜕𝐿 𝜕𝑢
= . = 2 𝑒𝑟𝑟𝑜𝑟 −𝑥 = −2 𝑥 𝑒𝑟𝑟𝑜𝑟
𝜕𝑚 𝜕𝑢 𝜕𝑚
If y = f(u) and u = g(x), then this abbreviated
𝑁 𝑁 form is written in Leibniz notation as:
1 1
𝐿(𝑚,𝑛) = (𝑦𝑖 − 𝑦ෝ𝑖 ) = (𝑒𝑟𝑟𝑜𝑟)2
2
𝑁 𝑁
𝑖=1 𝑖=1
error = 𝑦 − 𝑦ො 𝑒𝑟𝑟𝑜𝑟 = 𝑢 ∴ 𝑢2
𝜕𝐿 𝜕𝐿 𝜕𝑢
𝐿(𝑚,𝑛) = 𝑢2 = 𝑒𝑟𝑟𝑜𝑟 2 = 𝜕𝑢 . 𝜕𝑏 = 2 𝑒𝑟𝑟𝑜𝑟 (−1)
𝜕𝑏
𝜕𝐿 𝜕(𝑢)2 𝜕𝑢
=
𝜕(𝑦−(𝑚𝑥+𝑏)) 𝜕(𝑦−𝑚𝑥−𝑏))
= =
𝜕(−𝑚𝑥+𝑦−𝑏))
= −1
= = 2 𝑒𝑟𝑟𝑜𝑟 𝜕𝑏 𝜕𝑏 𝜕𝑏 𝜕𝑏
𝜕𝑢 𝜕𝑢
𝜕𝐿 𝜕𝐿 𝜕𝑢
= 𝜕𝑢 . 𝜕𝑚 = 2 𝑒𝑟𝑟𝑜𝑟 −1 = −2 𝑒𝑟𝑟𝑜𝑟=
𝜕𝑚
Updating the learnable parameters of m and b
𝜕𝐿𝑜𝑠𝑠
𝑚 =𝑚+𝛼
𝜕𝑚
𝜕𝐿𝑜𝑠𝑠
𝑏 =𝑏+𝛼
𝜕𝑏
We have derived the update rule for a single weight and bias. In reality a deep neural
network has a lot of weights and biases, which are represented as matrices (or tensors),
and so our update rule should also be modified to update all weights and biases of the
network simultaneously.
Linear Regression in Numpy Linear Regression in sklearn
import numpy as np
import numpy as np
# Data Generation
np.random.seed(42) # Data Generation
x = np.random.rand(100, 1) np.random.seed(42)
y = 1 + 2 * x + .1 * np.random.randn(100, 1) x = np.random.rand(100, 1)
y = 1 + 2 * x + .1 * np.random.randn(100, 1)
# Shuffles the indices
idx = np.arange(100) # Shuffles the indices
np.random.shuffle(idx) idx = np.arange(100)
np.random.shuffle(idx)
# Uses first 80 random indices for train
train_idx = idx[:80] # Uses first 80 random indices for train
# Uses the remaining indices for validation train_idx = idx[:80]
val_idx = idx[80:] # Uses the remaining indices for validation
val_idx = idx[80:]
# Generates train and validation sets
x_train, y_train = x[train_idx], y[train_idx] # Generates train and validation sets
x_val, y_val = x[val_idx], y[val_idx] x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]
print(a, b)
Linear Regression in Numpy
import numpy as np
# Data Generation
np.random.seed(42)
x = np.random.rand(100, 1)
y = 1 + 2 * x + .1 * np.random.randn(100, 1) Make sure to always initialize your random
# Shuffles the indices
idx = np.arange(100)
seed to ensure reproducibility of your results.
np.random.shuffle(idx)
print(a, b)
For each epoch, there are four training steps:
# Sets learning rate
lr = 1e-1
# Defines number of epochs • Compute model’s predictions — this is the forward pass
n_epochs = 1000
• Compute the loss, using predictions and and labels and
for epoch in range(n_epochs):
# Computes our model's predicted output
the appropriate loss function for the task at hand
yhat = a + b * x_train • Compute the gradients for every parameter, this is the
# How wrong is our model? That's the error! backward pass
error = (y_train - yhat)
# It is a regression, so it computes mean squared error (MSE) • Update the parameters
loss = (error ** 2).mean()
print(a, b)
All arithmetic operates elementwise in Numpy Python:
multiply() function or simple * operation are used to perform
element wise matrix multiplication.