0% found this document useful (0 votes)

8 views

Regression

Uploaded by

Aadit Baheti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Regression

Uploaded by

Aadit Baheti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Introduction to Regression

Part A – Linear Models

Lecture Outline

• Linear models
• Estimate of the regression coefficients
• Model evaluation
• Interpretation

1
Predicting a Variable

Let’s imagine a scenario where we'd like to predict one variable using
another (or a set of other) variables.

Examples:
• Predicting the number of views a YouTube video will get next week
based on video length, the date it was posted, the previous number of
views, etc.
• Predicting which movies a Netflix user will rate highly based on their
previous movie ratings, demographic data, etc.
• Recommendation system

2
Data
The Advertising data set consists of the sales of a particular
product in 200 different markets, and advertising budgets
,

for the product in each of those markets for three different

media: TV, radio, and newspaper. Everything is given in units
of $1000.
TV radio newspaper sales
230.1 37.8 69.2 22.1
44.5 39.3 45.1 10.4
17.2 45.9 69.3 9.3
151.5 41.3 58.5 18.5
180.8 10.8 58.4 12.9

3
Response vs. Predictor Variables

There is an asymmetry in many of these problems:

The variable we would like to predict may be more
difficult to measure, is more important than the
other(s), or maybe directly or indirectly influenced by the
other variable(s).

Thus, we'd like to define two categories of variables:

• variables whose values we want to predict
• variables whose values we use to make our prediction

4
Response vs. Predictor Variables

X Y
predictors outcome
features response variable
covariates dependent variable
n observations

TV radio newspaper sales

230.1 37.8 69.2 22.1
44.5 39.3 45.1 10.4
17.2 45.9 69.3 9.3
151.5 41.3 58.5 18.5
180.8 10.8 58.4 12.9

p predictors
5
Response vs. Predictor Variables
𝑋 = 𝑋! , … , 𝑋"
𝑋# = 𝑥!# , … , 𝑥$# , … , 𝑥%# 𝑌 = 𝑦! , … , 𝑦%
predictors outcome
features response variable
covariates dependent variable
n observations

TV radio newspaper sales

230.1 37.8 69.2 22.1
44.5 39.3 45.1 10.4
17.2 45.9 69.3 9.3
151.5 41.3 58.5 18.5
180.8 10.8 58.4 12.9

p predictors
6
Linear Models
If we ask the question:

“how much more sales do we expect if we double the TV advertising

budget?”

Alternatively, we can build a model by first assuming a simple form of 𝑓:

𝑓 𝑥 = 𝛽! + 𝛽" 𝑋

7
Linear Regression

… then it follows that our estimate is:

Yb = fb(X) = c1 X + c0

where 𝛽!! and 𝛽!" are estimates of 𝛽! and 𝛽" respectively, that
we compute using observations.

8
Estimate of the regression coefficients
For a given data set

9
Estimate of the regression coefficients (cont)
Is this line good?

10
Estimate of the regression coefficients (cont)
Maybe this one?

11
Estimate of the regression coefficients (cont)
Or this one?

12
Estimate of the regression coefficients (cont)
Question: Which line is the best?
For each observation (𝑥( , 𝑦( ), the absolute residual is used to calculate the
residuals 𝑟) = |𝑦) −𝑦+) |.
Loss Function: Aggregate Residuals

How do we aggregate residuals across the entire dataset?

1. Max Absolute Error

2. Mean Absolute Error
3. Mean Squared Error

14
Estimate of the regression coefficients (cont)

Again, we use MSE as our loss function,

Xn Xn
1 2 1 2
L( 0 , 1 ) = (yi ybi ) = [yi ( 1X + 0 )] .
n i=1 n i=1

We choose 𝛽-*and 𝛽-+ in order to minimize the predictive errors made by

our model, i.e. minimize our loss function.

Then the optimal values for 𝛽-+ and 𝛽-* should be:
WE CALL THIS FITTING
OR TRAINING THE
b0 , b1 = argminL( 0, 1 ). MODEL
0, 1

15
Optimization

How does one minimize a loss function?

The global minima or maxima of
𝐿 𝛽+, 𝛽* must occur at a point where
the gradient (slope)
,- ,-
∇𝐿 = ,
,.! ,."
=0

• Brute Force: Try every combination

• Exact: Solve the above equation
• Greedy Algorithm: Gradient Descent

16
Optimization: Estimate of the regression coefficients
Brute force
A way to estimate argmin!& ,!' 𝐿 is to calculate the loss function for every
possible 𝛽# and 𝛽$ . Then select the 𝛽# and 𝛽$ that minimize the loss function.

Example: Estimate the the loss function for different 𝛽$ when 𝛽# is fixed to be 6:

Very computationally
expensive with many
coefficients

17
Gradient Descent
When we can’t analytically solve for the stationary points of the gradient, we
can still exploit the information in the gradient.
The gradient ∇𝐿 at any point is the direction of the steepest increase. The
negative gradient is the direction of steepest decrease.
By following the gradient, we can eventually find the lowest point.
This method is called Gradient Descent

18
Estimate of the regression coefficients: analytical solution
Take the gradient of the loss function and find the values of 𝛽#! and 𝛽#" where the
,- ,-
gradient is zero: ∇𝐿 = , =0
,.! ,."
This does not usually yield to a close form solution. However, for linear regression this
procedure gives us explicit formulae for 𝛽#! and 𝛽#" :
P
ˆ1 = i (x
Pi
x)(yi y)
(x x) 2
i i

ˆ0 = ȳ ˆ1 x̄

where 𝑦& and 𝑥̅ are sample means.

The line: b Y = b1 X + b0
is called the regression line.
19
Evaluation: Test Error

We need to evaluate the fitted model on new data, data that the model
did not train on, the test data.

The training MSE

here is 2.0 where the
test MSE is 12.3.
The training data
contains a strange
point – an outlier –
which confuses the
model.

Fitting to meaningless patterns in the training is called overfitting.

20
Evaluation: Model Interpretation

For linear models it’s important to interpret the parameters

The MSE of this model is very small. But the The MSE is very small, but the intercept is -0.5
slope is -0.05. That means the larger the which means that for very small budget we will
budget the less the sales. have negative sales.

21
Multi, Poly Regression and Model Selection
Part B: Multi-regression
Multiple Linear Regression

If you must guess someone's height, would you rather be told

• Their weight, only
• Their weight and gender
• Their weight, gender, and income
• Their weight, gender, income, and favorite number

Of course, you'd always want as much data about a person as possible.

Even though height and favorite number may not be strongly related, at
worst you could just ignore the information on favorite number. We want
our models to be able to take in lots of data as they make their
predictions.

23
Response vs. Predictor Variables

X Y
predictors outcome
features response variable
covariates dependent variable
n observations

TV radio newspaper sales

230.1 37.8 69.2 22.1
44.5 39.3 45.1 10.4
17.2 45.9 69.3 9.3
151.5 41.3 58.5 18.5
180.8 10.8 58.4 12.9

p predictors
24
Multilinear Models

In practice, it is unlikely that any response variable Y depends solely on one predictor x.
Rather, we expect that is a function of multiple predictors 𝑓(𝑋! , … , 𝑋" ). Using the
notation we introduced last lecture,

𝑌 = 𝑦! , … , 𝑦# , 𝑋 = 𝑋! , … , 𝑋" and 𝑋$ = 𝑥!$ , … , 𝑥%$ , … , 𝑥#$ ,

we can still assume a simple form for 𝑓 – a multilinear

form:
𝑓 𝑋! , … , 𝑋( = 𝛽) + 𝛽! 𝑋! + ⋯ + 𝛽( 𝑋(

. has the form:

Hence, 𝑓,
𝑓3 𝑋! , … , 𝑋( = 𝛽3) + 𝛽3! 𝑋! + ⋯ + 𝛽3( 𝑋(

25
Multiple Linear Regression

Given a set of observations,

{(x1,1 , . . . , x1,J , y1 ), . . . (xn,1 , . . . , xn,J , yn )},

the data and the model can be expressed in vector notation,

0 1 0 1
0 1 1 x1,1 ... x1,J 0
y1 B C B 1 C
B .. C B 1 x2,1 ... x2,J C B C
Y = @ . A, X=B .. .. .. .. C, = B . C,
@ . . . . A @ .. A
yy
1 xn,1 ... xn,J J

26
Multilinear Model, example

For our data

𝑆𝑎𝑙𝑒𝑠 = 𝛽" + 𝛽! × 𝑇𝑉 + 𝛽/ ×𝑅𝑎𝑑𝑖𝑜 + 𝛽0 ×𝑁𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟

In linear algebra notation

𝑆𝑎𝑙𝑒𝑠! 1 𝑇𝑉! 𝑅𝑎𝑑𝑖𝑜! 𝑁𝑒𝑤𝑠! 𝛽"

𝒀= ⋮ ,𝑿 = ⋮ ⋮ ⋮ ,𝜷 = ⋮
𝑆𝑎𝑙𝑒𝑠1 1 𝑇𝑉1 . 𝑅𝑎𝑑𝑖𝑜1 𝑁𝑒𝑤𝑠1 𝛽0

= ×

27
Multiple Linear Regression

The model takes a simple algebraic form:

Y =X +✏

We will again choose the MSE as our loss function, which can be
expressed in vector notation as
1
MSE( ) = kY X k2
n
Minimizing the MSE using vector calculus yields,

1
b = X> X X> Y = argmin MSE( ).

28
Interpreting multi-linear regression

For linear models, it is easy to interpret the model parameters.

When we have a large number of predictors:

𝑋! , … , 𝑋" , there will be a large number of
model parameters, 𝛽! , 𝛽& , … , 𝛽" .

Looking at the values of 𝛽’s is impractical, so

we visualize these values in a feature
importance graph.

The feature importance graph shows which

predictors has the most impact on the
model’s prediction.

29
Qualitative Predictors

So far, we have assumed that all variables are quantitative. But in

practice, often some predictors are qualitative.
Example: The credit data set contains information about balance, age,
cards, education, income, limit , and rating for a number of potential
customers.

Income Limit Rating Cards Age Education Gender Student Married Ethnicity Balance
14.890 3606 283 2 34 11 Male No Yes Caucasian 333

106.02 6645 483 3 82 15 Female Yes Yes Asian 903

104.59 7075 514 4 71 11 Male No No Asian 580

148.92 9504 681 3 36 11 Female No No Asian 964

55.882 4897 357 2 68 16 Male No Yes Caucasian 331

30
Qualitative Predictors

If the predictor takes only two values, then we create an indicator or

dummy variable that takes on two possible numerical values.
For example for the gender, we create a new variable:
⇢
1 if i th person is female
xi =
0 if i th person is male
We then use this variable as a predictor in the regression equation.
⇢
0 + 1 + ✏i if i th person is female
yi = 0 + 1 xi + ✏i =
0 + ✏i if i th person is male

31
Qualitative Predictors

Question: What is interpretation of 𝛽+ and 𝛽*?

32
Qualitative Predictors

Question: What is interpretation of 𝛽+ and 𝛽*?

• 𝛽+ is the average credit card balance among males,

• 𝛽+ + 𝛽* is the average credit card balance among females,

• and 𝛽* the average difference in credit card balance between females

and males.

Example: Calculate 𝛽+ and 𝛽* for the Credit data.

You should find 𝛽+~$509, 𝛽*~$19

33
More than two levels: One hot encoding

Often, the qualitative predictor takes more than two values (e.g. ethnicity in
the credit data).

In this situation, a single dummy variable cannot represent all possible

values.

We create additional dummy variable as:

⇢
1 if i th person is Asian
xi,1 =
0 if i th person is not Asian
⇢
1 if i th person is Caucasian
xi,2 =
0 if i th person is not Caucasian
34
More than two levels: One hot encoding

We then use these variables as predictors, the regression

equation becomes: 8
< 0 + 1 + ✏i if i th person is Asian
yi = 0 + 1 xi,1 + 2 xi,2 + ✏i = 0 + 2 + ✏i if i th person is Caucasian
:
0 + ✏i if i th person is AfricanAmerican

Question: What is the interpretation of 𝛽" , 𝛽! , 𝛽/ ?

35
Polynomial Regression

36
Fitting non-linear data

Multi-linear models can fit large datasets with many

predictors. But the relationship between predictor and target
isn’t always linear.

We want a model:
𝑦 = 𝑓4 𝑥
Where 𝑓is a non-linear
function and 𝛽 is a
vector of the parameters
of 𝑓.

37
Polynomial Regression

The simplest non-linear model we can consider, for a response Y and a

predictor X, is a polynomial model of degree M,
𝑦 = 𝛽" + 𝛽! 𝑥 + 𝛽/ 𝑥 / + ⋯ + 𝛽5 𝑥 5
Just as in the case of linear regression with cross terms, polynomial
regression is a special case of linear regression - we treat each 𝑥 5 as a
separate predictor. Thus, we can write
0 1 0 1
0 1 1 x11 ... xM
1 0
y1 B C B C
B .. C B 1 x12 ... xM
2 C B 1 C
Y = @ . A, X=B .. .. .. .. C, =B .. C.
@ . . . . A @ . A
yn
1 xn ... xM
n M

38
Polynomial Regression
This looks a lot like multi-linear regression where the predictors are
powers of x!
Multi-Regression
0 1 0 1
0 1 1 x1,1 ... x1,J 0
y1 B C B 1 C
B C B 1 x2,1 ... x2,J C B C
Y = @ ... A , X=B .. .. .. .. C, = B . C,
@ . . . . A @ .. A
yyn
1 xn,1 ... xn,J J

Poly-Regression
0 1 0 1
0 1 1 x11 ... xM
1 0
y1 B C B C
B C B 1 x12 ... xM C B 1 C
Y = @ ... A ,
2
X=B .. .. .. .. C, =B .. C.
@ . . . . A @ . A
yn
1 xn ... xM
n M
39
Model Training

Give a dataset 𝑥! , 𝑦! , 𝑥/ , 𝑦/ , … , 𝑥1 , 𝑦1 , we find the optimal

polynomial model:
𝑦 = 𝛽" + 𝛽! 𝑥 + 𝛽/ 𝑥 / + ⋯ + 𝛽5 𝑥 5

1. We transform the data by adding new predictors:

𝑥B = [1, 𝑥B! , 𝑥B/ , … , 𝑥B5 ]
where 𝑥B6 = 𝑥 6
2. Fit the parameters by minimizing the MSE using vector
calculus. As in multi-linear regression:
8𝟏
E= 𝑿
𝜷 F𝑻 𝑿
F F𝑻𝒚
𝑿
40
Polynomial Regression (cont)
Fitting a polynomial model requires choosing a degree.

Degree 1 Degree 2 Degree 50

Underfitting: when the degree is We want a model that fits the Overfitting: when the degree is
too low, the model cannot fit the trend and ignores the noise. too high, the model fits all the
trend. noisy data points.

41
Feature Scaling
Do we need to scale out features for polynomial regression?

Linear regression, 𝑌 = 𝑋𝛽, is invariant under scaling. If 𝑋 is called by some number

!
𝜆 then 𝛽 will be scaled by and MSE will be identical.
'

However, if the range of 𝑋 is low or large then we run into troubles. Consider a
polynomial degree of 20 and the maximum or minimum value of any predictor is large
or small. Those numbers to the 20th power will be problematic.

It is always a good idea to scale 𝑋 when considering polynomial regression:

#()*
𝑋 − 𝑋<
𝑋 =
𝜎+

Note: sklearn’s StandardScaler() can do this.

42
High degree of polynomial
leads to OVERFITTING!

CBT - Welding Question
95% (21)
CBT - Welding Question
8 pages
SFI Product Description PDF
75% (4)
SFI Product Description PDF
12 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
AAI Lecture 10 Sp 25
No ratings yet
AAI Lecture 10 Sp 25
37 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
3-lecture 3-1
No ratings yet
3-lecture 3-1
39 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Sta 3
No ratings yet
Sta 3
9 pages
ML Module 2
No ratings yet
ML Module 2
185 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Lecture 4
No ratings yet
Lecture 4
62 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
W1.3_Regression_2
No ratings yet
W1.3_Regression_2
28 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
ML Unit3
No ratings yet
ML Unit3
9 pages
Predictive Analytics (2)
No ratings yet
Predictive Analytics (2)
46 pages
Lec 5 c Analytics Regression
No ratings yet
Lec 5 c Analytics Regression
51 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
IS4242 W3 Regression Analyses
No ratings yet
IS4242 W3 Regression Analyses
67 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Linear Regression
No ratings yet
Linear Regression
97 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Hanan
No ratings yet
Hanan
9 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
MOD3_EDA
No ratings yet
MOD3_EDA
16 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Cadcam - Rme701
No ratings yet
Cadcam - Rme701
2 pages
2024 Lplpo MRB 2
No ratings yet
2024 Lplpo MRB 2
11 pages
Carbon Dioxide Capture and Sequestration: Presentation by A.Srividya 19IS1D0104
No ratings yet
Carbon Dioxide Capture and Sequestration: Presentation by A.Srividya 19IS1D0104
8 pages
Hacking, Ian - The Inverse Gambler's Fallacy (1987)
No ratings yet
Hacking, Ian - The Inverse Gambler's Fallacy (1987)
11 pages
Tutsheet 7 New
No ratings yet
Tutsheet 7 New
4 pages
Green It PPT
No ratings yet
Green It PPT
23 pages
Maintain Prod Uniformity
No ratings yet
Maintain Prod Uniformity
13 pages
Ple2e TB 6b MT Answers 2019
No ratings yet
Ple2e TB 6b MT Answers 2019
5 pages
2012 - Al-Dhubiab - Pharmaceutical Applications and Phytochemical Profile of Cinnamomum Burmannii
No ratings yet
2012 - Al-Dhubiab - Pharmaceutical Applications and Phytochemical Profile of Cinnamomum Burmannii
12 pages
Ice Grade 2 Slide Presentation
No ratings yet
Ice Grade 2 Slide Presentation
84 pages
(Ebook) Distilled Spirits: New Horizons: Energy, Environment and Enlightenment by P. S. Hughes, G. M. Walker ISBN 9781907284458, 1907284451 instant download
100% (1)
(Ebook) Distilled Spirits: New Horizons: Energy, Environment and Enlightenment by P. S. Hughes, G. M. Walker ISBN 9781907284458, 1907284451 instant download
37 pages
QC Chem Reviewer
No ratings yet
QC Chem Reviewer
84 pages
Irrigation Lect 3
No ratings yet
Irrigation Lect 3
57 pages
Eye Notes
No ratings yet
Eye Notes
21 pages
Zen System Manual
No ratings yet
Zen System Manual
40 pages
Mio Amore - Starting and Warming Up
No ratings yet
Mio Amore - Starting and Warming Up
1 page
Sickle Cell Anemia: By: Nancy Saber Roba Shaat Mohamed Samir El-Asaly Under Supervision: Prof. Dr. Aziza Mahrous
No ratings yet
Sickle Cell Anemia: By: Nancy Saber Roba Shaat Mohamed Samir El-Asaly Under Supervision: Prof. Dr. Aziza Mahrous
43 pages
Brochure Andre 2014
No ratings yet
Brochure Andre 2014
7 pages
Physics Grade 10 Unit 4 Summarized Note
No ratings yet
Physics Grade 10 Unit 4 Summarized Note
24 pages
Material Cost PDF
0% (2)
Material Cost PDF
45 pages
Vaga Hospital - LUC005 - Aditya Birla
No ratings yet
Vaga Hospital - LUC005 - Aditya Birla
7 pages
CMR Carbonates Petrophysics
100% (1)
CMR Carbonates Petrophysics
5 pages
Physics Project
88% (8)
Physics Project
11 pages
AU Price List 01 Jul 2021
No ratings yet
AU Price List 01 Jul 2021
40 pages
Air Pollution Monitoring Using IOT (PPT)
No ratings yet
Air Pollution Monitoring Using IOT (PPT)
15 pages
PSPCL Bill 3005015344 Due On 2023-JUL-10
No ratings yet
PSPCL Bill 3005015344 Due On 2023-JUL-10
2 pages
Table - District-Wise Average Monthly (FROM DEC. 2002 2.01 and Annual Rainfall in Assam. TO NOV. 2003)
No ratings yet
Table - District-Wise Average Monthly (FROM DEC. 2002 2.01 and Annual Rainfall in Assam. TO NOV. 2003)
2 pages
Impedance Matching: An Award-Session AP10 Pedal' Style Acoustic Instrument Preamp
No ratings yet
Impedance Matching: An Award-Session AP10 Pedal' Style Acoustic Instrument Preamp
1 page

Regression

Uploaded by

Regression

Uploaded by

Introduction to Regression

Part A – Linear Models

for the product in each of those markets for three different

There is an asymmetry in many of these problems:

Thus, we'd like to define two categories of variables:

TV radio newspaper sales

TV radio newspaper sales

“how much more sales do we expect if we double the TV advertising

Alternatively, we can build a model by first assuming a simple form of 𝑓:

… then it follows that our estimate is:

How do we aggregate residuals across the entire dataset?

1. Max Absolute Error

Again, we use MSE as our loss function,

We choose 𝛽-*and 𝛽-+ in order to minimize the predictive errors made by

How does one minimize a loss function?

• Brute Force: Try every combination

where 𝑦& and 𝑥̅ are sample means.

The training MSE

Fitting to meaningless patterns in the training is called overfitting.

For linear models it’s important to interpret the parameters

If you must guess someone's height, would you rather be told

Of course, you'd always want as much data about a person as possible.

TV radio newspaper sales

𝑌 = 𝑦! , … , 𝑦# , 𝑋 = 𝑋! , … , 𝑋" and 𝑋$ = 𝑥!$ , … , 𝑥%$ , … , 𝑥#$ ,

we can still assume a simple form for 𝑓 – a multilinear

. has the form:

Given a set of observations,

the data and the model can be expressed in vector notation,

For our data

In linear algebra notation

𝑆𝑎𝑙𝑒𝑠! 1 𝑇𝑉! 𝑅𝑎𝑑𝑖𝑜! 𝑁𝑒𝑤𝑠! 𝛽"

The model takes a simple algebraic form:

For linear models, it is easy to interpret the model parameters.

When we have a large number of predictors:

Looking at the values of 𝛽’s is impractical, so

The feature importance graph shows which

So far, we have assumed that all variables are quantitative. But in

106.02 6645 483 3 82 15 Female Yes Yes Asian 903

104.59 7075 514 4 71 11 Male No No Asian 580

148.92 9504 681 3 36 11 Female No No Asian 964

If the predictor takes only two values, then we create an indicator or

Question: What is interpretation of 𝛽+ and 𝛽*?

Question: What is interpretation of 𝛽+ and 𝛽*?

• 𝛽+ is the average credit card balance among males,

• 𝛽+ + 𝛽* is the average credit card balance among females,

• and 𝛽* the average difference in credit card balance between females

Example: Calculate 𝛽+ and 𝛽* for the Credit data.

In this situation, a single dummy variable cannot represent all possible

We create additional dummy variable as:

We then use these variables as predictors, the regression

Question: What is the interpretation of 𝛽" , 𝛽! , 𝛽/ ?

Multi-linear models can fit large datasets with many

The simplest non-linear model we can consider, for a response Y and a

Give a dataset 𝑥! , 𝑦! , 𝑥/ , 𝑦/ , … , 𝑥1 , 𝑦1 , we find the optimal

1. We transform the data by adding new predictors:

Degree 1 Degree 2 Degree 50

Linear regression, 𝑌 = 𝑋𝛽, is invariant under scaling. If 𝑋 is called by some number

It is always a good idea to scale 𝑋 when considering polynomial regression:

Note: sklearn’s StandardScaler() can do this.

You might also like