Regression and Classification

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Regression and

Classification
Datasets in univariate
setting
𝑥3
𝑦3
x1 y
0.00632 24
0.02731 21.6
0.02729 34.7
0.03237 33.4
0.06905 36.2
0.02985 28.7
0.08829 22.9
0.14455 27.1
0.21124 16.5
0.17004 18.9
0.22489 15
0.11747 18.9
0.09378 21.7
0.62976 20.4
0.63796 18.2
0.62739 19.9
1.05393 23.1
0.7842 17.5
0.80271 20.2
0.7258 18.2
Linear Regression Hypothesis
(Model)
In matrix form
• ℎ𝜃 𝑥 =
• This
𝜃 𝑇 𝑥 is not just fancy math. This is important for implementation!
Why?
Linear Regression Cost
Function

m = how many data point we have


n = how many features we have
𝜃𝑖= parameter for feature i
ℎ 𝜃 (𝑥 )𝑖 = our estimation of the
𝑦result
𝑖 = actual
in the value
training set of the result in the training set
Linear Regression:
multivariate
• Linear Regression for multi features

• Each 𝑋 is a multi dimensional vector, 𝑦 is one


reading
Datasets in Multivariate
setting
𝑋 3 𝑥 5
3

x10 x11 x12 x13


𝑦3
x1 x2 x3 x4 x5 x6 x7 x8 x9 y
0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24
0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 21.6
0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.9 5.33 36.2
0.02985 0 2.18 0 0.458 6.43 58.7 6.0622 3 222 18.7 394.12 5.21 28.7
0.08829 12.5 7.87 0 0.524 6.012 66.6 5.5605 5 311 15.2 395.6 12.43 22.9
0.14455 12.5 7.87 0 0.524 6.172 96.1 5.9505 5 311 15.2 396.9 19.15 27.1
0.21124 12.5 7.87 0 0.524 5.631 100 6.0821 5 311 15.2 386.63 29.93 16.5
0.17004 12.5 7.87 0 0.524 6.004 85.9 6.5921 5 311 15.2 386.71 17.1 18.9
0.22489 12.5 7.87 0 0.524 6.377 94.3 6.3467 5 311 15.2 392.52 20.45 15
0.11747 12.5 7.87 0 0.524 6.009 82.9 6.2267 5 311 15.2 396.9 13.27 18.9
0.09378 12.5 7.87 0 0.524 5.889 39 5.4509 5 311 15.2 390.5 15.71 21.7
0.62976 0 8.14 0 0.538 5.949 61.8 4.7075 4 307 21 396.9 8.26 20.4
0.63796 0 8.14 0 0.538 6.096 84.5 4.4619 4 307 21 380.02 10.26 18.2
0.62739 0 8.14 0 0.538 5.834 56.5 4.4986 4 307 21 395.62 8.47 19.9
1.05393 0 8.14 0 0.538 5.935 29.3 4.4986 4 307 21 386.85 6.58 23.1
0.7842 0 8.14 0 0.538 5.99 81.7 4.2579 4 307 21 386.75 14.67 17.5
0.80271 0 8.14 0 0.538 5.456 36.6 3.7965 4 307 21 288.99 11.69 20.2
0.7258 0 8.14 0 0.538 5.727 69.5 3.7965 4 307 21 390.95 11.28 18.2
Hypothesis
(Model)
Our prediction is called hypothesis or the model
Example

ℎ𝜃 𝑥 = 70 + 0.2𝑥1 + 0.01𝑥 2 + 4𝑥 3 − 𝑥4

The intercept term 𝑥0 = 1


Classification Example
Email: Spam/ Not Spam?
Online Transactions: Fraudulent (Yes/No)?
Tumor: Malignant/ Benign?
Limitation of Linear Regression

Applying Linear regression to a


classification problem is not good
process.
Logistic Regression (Classification)
• We apply Logistic Regression for
classification problems
• Let us see how logistic regression works
= 1
ℎ𝜃 𝑥 1+
𝑒 −𝜃 𝑇 𝑥
• We interpret this hypothesis as the
probability of having y=1
• Hence the probability of having y=0 is 1-ℎ𝜃
𝑥
• ℎ𝜃(𝑥)=0.7 (tell that 70% chance of positive output)
Logistic Regression (Classification)
Hypothesis
1
ℎ𝜃 𝑥 = 1

+
Cost function for one data point








Logistic Regression (Classification)
Hypothesis
1
ℎ𝜃 𝑥 = 1

+
Cost function for one data point


𝐽(ℎ𝜃 𝑥 , 𝑦) = −ylog(ℎ
− 𝜃 𝑥 ) − (1 − 𝑦)log(1

− ℎ𝜃 𝑥 ) �

Cost function as all data points �



Logistic
Regression
• To see how we can minimize this hypothesis error we need to take
the derivative of the cost function as in the linear regression case
Logistic Regression error
minimization
𝜕𝐽(ℎ𝜃 𝑥 , 𝑦) ℎ 𝜃 𝑥 − 𝑦 𝑥𝑗
𝜕𝜃𝑗 =

• The previous formula minimizing the error for individual data point

• So if you would like to minimize for all data points on average, then we
need to take the sum and divide by m
Gradient Decent
Regressions
Linear Regression Logistic Regression (Classification)

Technically we do not need to calculate the cost function in the Gradient decent case

You might also like