0% found this document useful (0 votes)

11 views60 pages

DS w13 Regression

Lecture 13 of CENG313 covers regression in data science, explaining how it predicts real-valued outputs based on input features, contrasting it with classification. It discusses linear regression, including the least squares method, the impact of outliers, and introduces LASSO regression for feature selection. The lecture also emphasizes the importance of assessing model performance using metrics like R-squared and correlation.

Uploaded by

Kofte Wkmek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views60 pages

DS w13 Regression

Uploaded by

Kofte Wkmek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

CENG313 Introduction to Data

Science
Lecture 13: Regression

Instructor: Assist. Prof. Ceren Güzel Turhan

Regression

• Regression is similar to classification: you have a

number of input features, and you want to predict
an output feature. In classification, this output
feature is either binary or categorical. With
regression, it is a real-valued number.

Typically, regression algorithms model the output as a

linear combination of the inputs.
Regression

• Ex: Given the following values of X and Y as (X, Y):( 1,1), (2,2), (4,4), (100,100), (20,
20)
• What is the value of Y when X is 5?

• Now, let’s take a look at different example. Say you have the following pairs of X
and Y: (1,1), (2,4), (4,16), (100,10000), (20, 400). Can you calculate the value of Y
when X is 5?

Ref: https://medium.com/@amarbudhiraja
Regression

• Ex: Given the following values of X and Y as (X, Y):( 1,1), (2,2), (4,4), (100,100), (20,
20)
• What is the value of Y when X is 5?
• The answer is : 5. Not very difficult, right?

• Now, let’s take a look at different example. Say you have the following pairs of X
and Y: (1,1), (2,4), (4,16), (100,10000), (20, 400). Can you calculate the value of Y
when X is 5?

Ref: https://medium.com/@amarbudhiraja
Regression

• Ex: Given the following values of X and Y as (X, Y):( 1,1), (2,2), (4,4), (100,100), (20,
20)
• What is the value of Y when X is 5?
• The answer is : 5. Not very difficult, right?

• Now, let’s take a look at different example. Say you have the following pairs of X
and Y: (1,1), (2,4), (4,16), (100,10000), (20, 400). Can you calculate the value of Y
when X is 5?
• The answer is : 25. Was it difficult?

Ref: https://medium.com/@amarbudhiraja
Regression

• Let’s understand abit asto what happened in the above examples.

• When we look at the first example, after look at the given pairs, one can establish that
the relationship between Xand Yis Y=X.
• Similarly, in the second example, the relationship is Y= X*X.
• In these two examples, we candetermine the relationship between two given variables
(Xand Y)becausewe could easily identify the relationship betweenthem.

• Your computer looks at some examples and then tries to identify “the most suitable”
relationship between the sets X and Y. Using this identified relationship, it will try to
predict (or more) for new examples for which you don’t know Y.

Ref: https://medium.com/@amarbudhiraja
What is “Linear”?
You have a collection of (x,y) pairs, and you try to fit a line to them of theform.
Fitting a line to data.

Remember this:
Y=mX+B?

B
Regression
Plot the points out, fit aline to them by eye, trace the line with aruler, and use that to
pull out m and b.

Each of the four datasets has the same line of best fit and the same quality of fit
Estimate of the regression coefficients
For a given data set

ROTOPAPAS, PILLAI
CS109A, P 5
Estimate of the regression coefficients
Is this line good?

ROTOPAPAS, PILLAI
CS109A, P 6
Estimate of the regression coefficients
Maybe this one?

ROTOPAPAS, PILLAI
CS109A, P 7
Estimate of the regression coefficients
Or this one?

CS109A, PROTOPAPAS, PILLAI 8

Estimate of the regression coefficients
Question: Which line is the best?

As before, for each observation (𝑥!, 𝑦!)the absolute residuals, 𝑟! = | 𝑦! − 𝑦

#!|
quantify the error at each observation.
ROTOPAPAS, PILLAI
CS109A, P 9
Estimate of the regression coefficients

The standard way to fit a line is called least squares.

Least squares works by picking the values of m and b that

minimize the “loss function” which adds up an error term
across all of the points:

The line that best fits a set of data points is the one having the
smallest possible sum of squared errors
Linear Regression Example

Which of the 3 lines captures the pattern in the

data in the best possible way?
Linear Regression Example
Which of the 3 lines captures the pattern in the data in the best possible way?
Need to compute the sum of the squares

For the line Y= 4X+10:

Linear Regression Example (Cont)
Regression

The standard way to fit a line is called least squares. In Python, it can be fit using the Linear
Regression class in the example scripts, and the fit coefficients can be found in the following
way:

>>> import numpy as np

>>> from sklearn.linear_model import LinearRegression
>>> x = np.array([[0.0],[1.0],[2.0]])
>>> y = np.array([1.0,2.0,2.9])
>>> lm = LinearRegression().fit(x, y)
>>> lm.coef_ # m
array([ 0.95])
>>> lm.intercept_ # b
1.0166666666666671
Linear Regression to predict Diabetes
Linear Regression to predict Diabetes
Regression
The key thing to understand here is that this loss function makes least squares regression
extremely sensitive to outliers in the data:
• Three deviations of size 5 will give a loss of 75, but just a single larger deviation of
size 10 will give the larger penalty of size 100.

Linear regression will bend the parameters so as to avoid large deviations of even a single
point, which makes it unsuitable in situations where a handful of large deviations are to be
expected.

An alternative approach that is more suitable to data with outliers is to use the following
loss function:
Regression

• Here we just take the absolute values of the different error terms and add them
• This is called “L1 regression,” among other names.

• Outliers will still have an impact, but it is not as egregious as with least squares (L2).
•
• On the other hand, L1 regression penalizes small deviations from expectation more
harshly compared to least squares, and it is significantly more complicated to
implement computationally.
Regression
Fitting Nonlinear Curves

• Fitting a curve to data is a ubiquitous problem not just in data science but in
engineering and the sciences in general.
• Often, there are good a priori reasons that we expect acertain functional form,
and extracting the best-fit parameters will tell us something very meaningful
about the system we are studying.

A few Examples:
• Exponential decay to some baseline. This is useful for modeling
manyprocesses
where a system starts in some kind of agitated state
and decays toa baseline
Regression
Fitting Nonlinear Curves

Exponential growth:

Logistic growth:

Polynomials of various degrees:

Least squares is the typical approach in all of these cases, where we pickthe parameters to as to minimize the loss
function
Regression
Fitting Nonlinear Curves

The following script creates some data of the form y = 2 + 3x2, adds some noise to it,
and then usescurve_fit to fit acurve of the form y =a+bx2 to the data.

from scipy.optimize import curve_fit

xs = np.array([1.0, 2.0, 3.0, 4.0])
ys = 2.0 + 3.0 *xs*xs + 0.2*np.random.uniform(3)
def calc(x, a, b):
return a + b*x*x
cf = curve_fit(calc, xs, ys)
best_fit_params = cf[0]

Running it on computer, it found a = 2.33677376 and b = 3− a.

Regression
Goodness of Fit: R2 and Correlation

When you are assessing the quality of a fitted curve, there are two questions we
want to answer:
• How accurately can we predict values?
• We assumed that the data followed some functional form. Was that even a good
assumption?

• The standard way to answer the first of the questions is called R2, pronounced “R squared.”
• R2 is often described as the fraction of the variance that is accounted for by the model.
• A value of 1.0 means a perfect match, and a value of 0 means you didn’t capture any of the
variation.
• In some cases (there are a few different definitions of R2 floating around), it can even take on
negative values.
Regression
Goodness of Fit: R2 and Correlation

The calculation of R2 is based on two concepts:

The total variation:

where ȳ is the average of all the y values in your data.

The residual variation:

Regression
Goodness of Fit: R2 and Correlation

These allow us to say, in a precise sense, that your fitted model accounts for a certain
percentage of the variation in the data. The definition of R2 is then

and you can see it as the fraction of all variation that is captured by the model.

Of course, taking the squares of the residuals isn’t necessarily the “right” way to quantify
variation, but it is the most standard option.
Regression
Goodness of Fit: R2 and Correlation

Despite looking like a square, technically R2 can be negative if your model is truly abysmal.
Having R2 = 0 is what you would see if you just defined your fitted function to return the
average of y as a constant value.

You can think of this as the crudest way to fit a function to data. Do any worse than this, and your R2
score will go negative.

from sklearn.metrics import r2_score

rsquared_linear = r2_score(Y_test, preds_linear)
Regression
Goodness of Fit: R2 and Correlation

• Another way to quantify your goodness-of-fit is to simply take the correlation between your
predicted values and the known values in the test data.
• This has the advantage that you can use Pearson, Spearman, or Kendall correlation, depending
on how you want to deal with outliers.
• On the other hand though, correlation just measures whether your predictions and target
values are related; it doesn’t measure whether they actually match up.
Regression
Goodness of Fit: R2 and Correlation

Correlation of Residuals

The main ways to measure goodness-of-fit in regression

situations are R2 and correlation between predictions and targets.

The simplest way to assess the quality of our model form is to plot
the known data against a curve of the predicted values. Do they
match up?

In Anscombe’s quartet, for example, it is visually clear that a linear

model is the correct way to approach the first dataset, but the
wrong way to approach the second one.
Remember: Types of Correlation
y y
As x increases, y As x increases, y tends
tends to decrease. to increase.

x x
Negative Linear Correlation Positive Linear Correlation

y y

x x
No Correlation Nonlinear Correlation
Multi-variate Linear Regression
Now let’s move on from fitting a curve and into topics that fit more firmly under the “machine learning”
umbrella.

Linear regression is the same process as fitting a line to data, except that we say

where d is the number of input features we have.

Most of the previous sections carry over directly to this more general case: we fit the data using least
squares, we quantify performance using R2, and we can also use correlation between predicted and actual
values.
Linear Regression
The first big difference is that it’s no longer practical to plot the predicted curve against the actual data
points.

What you can do instead is to make a scatterplot between the known test values and the values predicted
for those test data points.

This allows us to gauge whether our model performs better for larger or smaller values and whether it
suffers from major outliers.
Linear Regression
This script will generate this figure for the linear regression
model:

from sklearn.metrics import r2_score

from sklearn.feature_selection import
r_regression
rsq = r2_score(y_test, y_pred)
corr = r_regression(y_test, y_pred)

# Create scatter plot with actual and

predicted values

plt.scatter(y_test, y_pred)
plt.xlabel('Actual’)
plt.ylabel('Predicted’)
plt.title(Lin. Reg. Corr= %f Rsq=%f’ %
(corr, rsq))
plt.show()
Linear Regression
We can see that there is a clear correlation between the
predicted and actual numbers, but it is fairly tenuous.

In particular, we can see that there are a number of data

points where the actual value was substantially below
our predictions.

In fact, the fit line as a whole looks slightly more shallow

than the data itself.

Together, these suggest that there are a number of

anomalously low data points, which are pulling our
overall predictions lower than perhaps they should be.
Linear Regression
The other thing that we can do with linear regression is use it to identify features in the data that are particularly
interesting.

In the example script, we used the normalize() function to scale all the features so that they had mean 0 and
standard deviation 1. This means that, by looking at the relative size of their weights in the linear model, we can
get a sense of how related they are to the progression of diabetes.

>>> print(linear.coef_)
[-28.12698694 -33.32069944 85.46294936 70.47966698
-37.66512686
20.59488356 -14.6726611 33.10813747 43.68434357
-5.50529361]

This suggests that the third and fourth features are particularly interesting, if we want to zero in on and examine
their relationship to diabetes more closely.
LASSO Regression and Feature Selection
Look at the coefficients in the linear regression again. We are able to identify several features as being more
promising than the other as targets of further investigation, but the painful truth is that all of the coefficients
except that last one are pretty big. There are two problems with this:

• It makes it harder to pinpoint exactly which features are the most interesting.
• There is a very good chance that the data are overfitted. Many of the moderate-
sized coefficients could be set so that they balance each other out, yielding a slightly better fit on the
training data itself but generalizing very poorly.
LASSO Regression and Feature Selection

• The idea of LASSO regression is that we still fit a linear model to the data, but we
want to penalize nonzero weights.

• A LASSO regression model takes in a parameter called alpha, which indicates how
severely nonzero weights should be punished.

• Setting alpha to 0 will reduce to linear regression.

• The default value, which was used in the example script, is 1.0.
LASSO Regression and Feature Selection

The sample script produces the same

scatterplot and performance metrics that
were created for linear regression. We can
see that the predicted/actual scatterplot
hugs the middle line a little more closely,
suggesting a better fit.
LASSO Regression and Feature Selection

The different between linear and lasso jumps out when we look at the fitted coefficients:

>>> print(lasso.coef_)
[ -0. -11.49747021 73.20707164 37.75257628
0. 0.
-10.36895667 3.70576596 24.17976499 0. ]

Four of the six features have weights of precisely 0. Of the remaining features, it is clear that the third is
the most relevant to diabetes progression, followed by the fourth and the ninth.
Regression

Example: Predicting Diabetes Progression

The following script uses a dataset describing physiological measurements taken from 442 diabetes
patients, with the target variable being an indicator of the progression of their disease.

import sklearn.datasets
import pandas as pd
Regression Example
from matplotlib import pyplot as plt
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import\
LinearRegression, Lasso
from sklearn.preprocessing import normalize
from sklearn.metrics import r2_score
diabetes = sklearn.datasets.load_diabetes()
X, Y = normalize(diabetes['data']), diabetes['target']
X_train, X_test, Y_train, Y_test = \
train_test_split(X, Y, test_size=.8)
Regression Example
linear = LinearRegression()
linear.fit(X_train, Y_train)
preds_linear = linear.predict(X_test)
corr_linear = round(pd.Series(preds_linear).corr(
pd.Series(Y_test)), 3)
rsquared_linear = r2_score(Y_test, preds_linear)
print("Linear coefficients:")
print(linear.coef_)
plt.scatter(preds_linear, Y_test)
plt.title("Lin. Reg. Corr=%f Rsq=%f"
% (corr_linear, rsquared_linear))
plt.xlabel("Predicted")
plt.ylabel("Actual")
# add x=y line for comparison
plt.plot(Y_test, Y_test, ‘k--’)
plt.show()
Regression
lasso = Lasso()
lasso.fit(X_train, Y_train)
preds_lasso = lasso.predict(X_test)
corr_lasso = round(pd.Series(preds_lasso).corr(
pd.Series(Y_test)), 3)
rsquared_lasso = round(
r2_score(Y_test, preds_lasso), 3)
print("Lasso coefficients:")
print(lasso.coef_)
plt.scatter(preds_lasso, Y_test)
plt.title("Lasso. Reg. Corr=%f Rsq=%f"
% (corr_lasso, rsquared_lasso))
plt.xlabel("Predicted")
plt.ylabel("Actual")
# add x=y line for comparison
plt.plot(Y_test, Y_test, ‘k--’)
plt.show()
Regression
Regression
>>> r e g . f i t ( X , y )

How to solve y=𝛽!x + 𝛽"?

𝛽!? 𝛽"?

ROTOPAPAS, PILLAI
CS109A, P 16
Optimization
How does one minimize a loss function?
The global minima or maxima of
𝐿 = ( 𝛽", 𝛽#)
must occur at a point where the
gradient (slope)

• Brute Force: Try every combination

• Exact: Solve the above equation
• Greedy Algorithm: Gradient Descent

ROTOPAPAS, PILLAI
CS109A, P 23
Derivative definition
A derivative is the instantaneous rate of change of a single valued
function. Given a function f(x) the derivative can be defined as:

ROTOPAPAS, PILLAI
CS109A, P
Partial derivatives
For a loss function 𝐿, the partial derivative is written as:
𝑥! 𝑥!

𝐿 = 𝑓(𝛽", 𝛽!)
𝑦

𝜕𝑓 𝜕𝑓
𝜕𝛽" 𝜕𝛽#
What is the rate of change of the function with respect to one
variable with the others held fixed?
ROTOPAPAS ILLAI
CS109A, P ,P
Partial derivative example

Looks like we’re going

to need the chain rule,
but what is it? I forget

ROTOPAPAS, PILLAI
CS109A, P
Partial derivative example
Partial derivative

CS109A, PROTOPAPAS, PILLAI

𝜕𝑓
Partial derivative
𝜕𝛽#

CS109A, PROTOPAPAS, PILLAI

Optimization
How does one minimize a loss function?
The global minima or maxima of
𝐿 = ( 𝛽", 𝛽#)
must occur at a point where the
gradient (slope)

• Brute Force: Try every combination

• Exact: Solve the above equation
• Greedy Algorithm: Gradient Descent

ROTOPAPAS, PILLAI
CS109A, P 23
Optimization

ROTOPAPAS, PILLAI
CS109A, P 25
Optimization
Summary: Estimate of the regression coefficients
We use MSE as our loss function,

We choose 𝛽" and 𝛽# in order to minimize the predictive errors made by

our model, i.e. minimize our loss function.

Then the optimal values for 𝛽" and 𝛽# should be:

Finding the exact solution only
works for rare cases. Linear WE CALL THISFITTING
regression is one of such rare cases. OR TRAINING THE
MODEL

ROTOPAPAS, PILLAI
CS109A, P 27
References
Introduction to Machine Learning with Python, a Guide for Data Scientists by Andreas C. Müller & Sarah
Guido.

Introduction to Regression, CS109A Introduction to Data Science by Pavlos Protopapas & Natesh Pillai .

Psychic Self Defense
100% (16)
Psychic Self Defense
69 pages
Private Schools Football Competition Proposal
100% (3)
Private Schools Football Competition Proposal
12 pages
Curve Fitting
100% (1)
Curve Fitting
43 pages
PDP 11 Handbook 1969
100% (2)
PDP 11 Handbook 1969
112 pages
Lesson 6 Psy Art
No ratings yet
Lesson 6 Psy Art
3 pages
ML Unit-2
100% (1)
ML Unit-2
52 pages
Gann Part 03
No ratings yet
Gann Part 03
16 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Dyslexia Powerpoint
No ratings yet
Dyslexia Powerpoint
13 pages
GACIS, Ramon Jr. G.: PRO Version
No ratings yet
GACIS, Ramon Jr. G.: PRO Version
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
English9 1ST Exam
100% (1)
English9 1ST Exam
4 pages
Key. Английский Язык. Ответы к Учебнику Н. А. Бонк, Г. А. Котий, Н. А. Лукьяновой, Л. Г. Пахмутовой На Все Задания и Упражнения (2007)
No ratings yet
Key. Английский Язык. Ответы к Учебнику Н. А. Бонк, Г. А. Котий, Н. А. Лукьяновой, Л. Г. Пахмутовой На Все Задания и Упражнения (2007)
175 pages
Machine Learning in Python - Course Notes
No ratings yet
Machine Learning in Python - Course Notes
36 pages
Dubey Et Al. 2022 - Activation Functions in Deep Learning - A Comprehensive Survey and Benchmark
No ratings yet
Dubey Et Al. 2022 - Activation Functions in Deep Learning - A Comprehensive Survey and Benchmark
17 pages
Prime Insurance
100% (1)
Prime Insurance
32 pages
Why I Became An Atheist
No ratings yet
Why I Became An Atheist
170 pages
Least Square Regression: Numerical Methods ECE 410
No ratings yet
Least Square Regression: Numerical Methods ECE 410
44 pages
Marico - Over The Wall
No ratings yet
Marico - Over The Wall
32 pages
Chapter 2 Random Variables
No ratings yet
Chapter 2 Random Variables
232 pages
Curve Fitting: There Are Two General Approaches For Curve Fitting
No ratings yet
Curve Fitting: There Are Two General Approaches For Curve Fitting
63 pages
Optical Comm
100% (1)
Optical Comm
9 pages
IGEH Cuadernillo 1ro
No ratings yet
IGEH Cuadernillo 1ro
61 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Ch10 - Curve Fitting
No ratings yet
Ch10 - Curve Fitting
157 pages
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
No ratings yet
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
23 pages
Clase 11 Calculo Numerico I
No ratings yet
Clase 11 Calculo Numerico I
37 pages
Curve Fitting
No ratings yet
Curve Fitting
48 pages
Ch17 Curve Fitting
No ratings yet
Ch17 Curve Fitting
44 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
ANUM 2012 Curve-Fitting
No ratings yet
ANUM 2012 Curve-Fitting
44 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Corr and Regress
No ratings yet
Corr and Regress
61 pages
Polynomial Curve Fitting
No ratings yet
Polynomial Curve Fitting
44 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Ch17 Curve Fitting
No ratings yet
Ch17 Curve Fitting
44 pages
Introduction To Cosmochemistry: Planet Earth
No ratings yet
Introduction To Cosmochemistry: Planet Earth
72 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
6 - Classification and Regression Tasks
No ratings yet
6 - Classification and Regression Tasks
115 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Numerical Methods With Applications
No ratings yet
Numerical Methods With Applications
34 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
CH 17
No ratings yet
CH 17
36 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Slide Chap11
No ratings yet
Slide Chap11
19 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
Topic 7 Linear Regreation CHP14
No ratings yet
Topic 7 Linear Regreation CHP14
21 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Hhghiikkk
No ratings yet
Hhghiikkk
29 pages
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
No ratings yet
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
30 pages
Fda Unit 5
No ratings yet
Fda Unit 5
20 pages
4 Regression Analysis
No ratings yet
4 Regression Analysis
44 pages
Technical Bulletin: Condition
100% (1)
Technical Bulletin: Condition
4 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
Winton Wood Superintendent Evaluation (Incomplete) 19-20
No ratings yet
Winton Wood Superintendent Evaluation (Incomplete) 19-20
4 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Deck2 BusinessIntelligence M1 ACSA
No ratings yet
Deck2 BusinessIntelligence M1 ACSA
15 pages
Machine Learning in Python
No ratings yet
Machine Learning in Python
36 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
What Is The Easiest Definition of 'Entropy' - Quora
No ratings yet
What Is The Easiest Definition of 'Entropy' - Quora
19 pages
Caffeine Experiment
No ratings yet
Caffeine Experiment
6 pages
S37 Make-to-Order With Component Availability Check: Scenario Overview
No ratings yet
S37 Make-to-Order With Component Availability Check: Scenario Overview
14 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Additional Material - Linear Regression
No ratings yet
Additional Material - Linear Regression
11 pages
Blue Whale Communication
No ratings yet
Blue Whale Communication
9 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Annotated 4 Ch4 Linear Regression F2014
No ratings yet
Annotated 4 Ch4 Linear Regression F2014
11 pages
A Culture of Safety Weinberg en 44834
No ratings yet
A Culture of Safety Weinberg en 44834
6 pages
2017 Grid English HL List of Poems For NSC 2017
No ratings yet
2017 Grid English HL List of Poems For NSC 2017
6 pages
Acoustic 1
No ratings yet
Acoustic 1
5 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Operation Guide 5269: About This Manual
No ratings yet
Operation Guide 5269: About This Manual
9 pages
INT354 - Unit 4
No ratings yet
INT354 - Unit 4
50 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Non Linear Regression
No ratings yet
Non Linear Regression
6 pages
Newsletter Term 3 2024
No ratings yet
Newsletter Term 3 2024
12 pages
Ear: 1 Semester: 2
No ratings yet
Ear: 1 Semester: 2
2 pages
Script - What Is Applied Behaviour Analysis
No ratings yet
Script - What Is Applied Behaviour Analysis
4 pages
Output Input Linear Correlation Coefficient Regression Analysis
No ratings yet
Output Input Linear Correlation Coefficient Regression Analysis
6 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
Regression
No ratings yet
Regression
4 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Coding 2
No ratings yet
Coding 2
3 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

DS w13 Regression

Uploaded by

DS w13 Regression

Uploaded by

CENG313 Introduction to Data

Instructor: Assist. Prof. Ceren Güzel Turhan

• Regression is similar to classification: you have a

Typically, regression algorithms model the output as a

• Let’s understand abit asto what happened in the above examples.

CS109A, PROTOPAPAS, PILLAI 8

As before, for each observation (𝑥!, 𝑦!)the absolute residuals, 𝑟! = | 𝑦! − 𝑦

The standard way to fit a line is called least squares.

Least squares works by picking the values of m and b that

Which of the 3 lines captures the pattern in the

For the line Y= 4X+10:

>>> import numpy as np

Polynomials of various degrees:

from scipy.optimize import curve_fit

Running it on computer, it found a = 2.33677376 and b = 3− a.

The calculation of R2 is based on two concepts:

The total variation:

where ȳ is the average of all the y values in your data.

The residual variation:

from sklearn.metrics import r2_score

The main ways to measure goodness-of-fit in regression

In Anscombe’s quartet, for example, it is visually clear that a linear

where d is the number of input features we have.

from sklearn.metrics import r2_score

# Create scatter plot with actual and

In particular, we can see that there are a number of data

In fact, the fit line as a whole looks slightly more shallow

Together, these suggest that there are a number of

• Setting alpha to 0 will reduce to linear regression.

The sample script produces the same

Example: Predicting Diabetes Progression

How to solve y=𝛽!x + 𝛽"?

• Brute Force: Try every combination

Looks like we’re going

CS109A, PROTOPAPAS, PILLAI

CS109A, PROTOPAPAS, PILLAI

• Brute Force: Try every combination

We choose 𝛽" and 𝛽# in order to minimize the predictive errors made by

Then the optimal values for 𝛽" and 𝛽# should be:

You might also like