0% found this document useful (0 votes)

28 views26 pages

Module 3 - SimpleLinearRegression - Afterclass1b

This document provides an overview of using linear regression to predict wine quality. It describes how wine professor Orley Ashenfelter used variables like weather conditions (average growing season temperature and rainfall), wine age, and French population to build a linear regression model for predicting wine prices from 1952-1978, which served as a proxy for wine quality. The document defines key concepts like the regression function, intercept, slope, residuals, and the ordinary least squares criterion used to estimate the regression model coefficients.

Uploaded by

Vanessa Wong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views26 pages

Module 3 - SimpleLinearRegression - Afterclass1b

Uploaded by

Vanessa Wong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

IIMT 2641 Introduction to Business Analytics

Module 3: Linear Regression

Topic 1: Simple Linear Regression

1
Bordeaux wine

§ Large differences in price and quality between years, although wine is

produced in a similar way
§ Meant to be aged, so hard to tell if wine will be good when it is on the
market
§ Expert tasters predict which ones will be good
§ Can analytics be used to come up with a different system for judging wine?
3
Predicting the quality of wine

§ March 1990 - Orley Ashenfelter, a Princeton economics professor, claims

he can predict wine quality without tasting the wine

4
Building a model

§ Ashenfelter used a method called linear regression

– Predicts an outcome variable, or dependent variable
– Predicts using a set of independent variables

5
Building a model
§ Dependent variable:
– Typical price in 1990-1991 wine auctions (approximates quality)
– Conduct logarithmic transformation
q A better linear fit

§ Independent variables:
– Age of wine (in 1990)
q Older wines are more expensive
– Weather
q Average Growing Season Temperature (AGST)
q Harvest Rain
q Winter Rain
– Population of France

6
The wine data (1952 - 1978)

8
The wine data (1952 - 1978)

Quick Question: What is the relationship between harvest rain, average

growing season temperature, and wine prices?
9
Baseline model (?)

10
Baseline model (Take the mean)
ne
y .....
In

&
11
One-Variable Linear Regression
me

e ⑧
12
Simple Regression Model
The population model of y with one predictor variable x is:
-

-
-
! =# +# %+ε
! "

I
-

↑
-

§ y is the dependent variable (DV) Bl

§ x is the independent variable (IV) Pr

§ Regression Function
§ E[Y|x] = -
-
!
- ! + !" # is the mean of Y given x
-

§ !! is the y-intercept (value of E[Y|0] when x=0)

e n

§ !" is the slope for x, which is the change in E[Y|x] for a unit increase&
in x
-

§ Random errors e (not required)

§ Random errors are a random sample from $ 0, '#

-
Random samples are i.i.d. or
independent and identically
-

Each observation has a random error

§ The output does not show these, but it does estimate se distributed random variables
§ The random errors e and IV (X) are uncorrelated
§ These assumptions are important for effective business analytics
-

13
Estimated Regression Function
§ Estimates the regression model with n observations (xi,yi) for i = 1, …, n
-

§ The estimated or predicted value of y given x is:

&
"! = $ ! + $" &
44

&
§ '! is the sample estimate of the population intercept #!-

§ '" is the sample estimate of the population slope #"

'! and '" are sample statistics

)
(similar to *)
and have sampling distributions

14
One-Variable Linear Regression

brtb i)
,

15
Data and Predicted Values
§ What is the observed y when x = 1?

YG =

§ What is the predicted y when x = 1?

4 =

§ What is the observed y when x = 4?

§ What is the predicted y when x = 4?

16
Data and Predicted Values
§ What is the observed y when x = 1?

y=6

§ What is the predicted y when x = 1?

!=1+(2)(1)
, =3

§ What is the observed y when x = 4?

y=4

§ What is the predicted y when x = 4?

!=1+(2)(4)
, =9

17
Estimated Model and-
Residuals
e

§ Residuals are the difference between the observed values of ! and

predicted values of !,
-

– y - $# 8
! =-=
– Each observation has one observed y, one predicted $,
# and one residual r.
§ The residuals are errors between the observed and predicted values.

"!$
## = "# − "!#
y1
"!" #$ = "$ − "!$
"!#
#! = "! − "!!

y4
"!! y2
#" = "" − "!"

18
Computing Residuals

↓ r4
r1 r2

j
§ What is the residual r2 at x = 2? 1

y Y
=
-

§ What is the residual r3 at x = 3?

19
Computing Residuals

r3
r4
r1 r2

r -

§ What is the residual r2 at x = 2?

-# = !# − !,# = 3 − 1 + 2 ∗ 2 = 3 − 5 = −2

§ What is the residual r3 at x = 3?

-$ = !$ − !,$ = 11 − 1 + 2 ∗ 3 = 11 − 7 = 4

20
Ordinary Least Squares Criterion or (OLS)
The least squares line finds the estimates '! and '" of the coefficients to
minimize the sums-of-squares error for a sample {(xi,yi)} with n observations:
W I
-
667 = !% − !,% # ∑'%&"
!,% = '! + '" %% for < = 1, … , ? SSE

Why squared?
↑ min
The sum of residuals could be zero.
-> '
667('! , '" ) = A !% − '! − '" %% #
∑'%&" %% − %̅ !% − !)
D
-

%&" '" =
B667('! , '" ) ∑'%&" %% − %̅ #

3
= 0
B'! C
'! = !) − '" %̅
B667('! , '" ) %:̅ sample average of independent variable
-

= 0 -

B'" ) sample average of dependent variable

!:
- -
- -

Do not need to memorize.

21
Estimate a linear model H0: AGST Coefficient = 0 versus HA: AGST
(One Variable ) Coefficient ≠ 0
-

O0
Estimated Standard Errors t-score = (Estimated Coefficient – 0)/(Standard Error)
intercept and for estimated
slope coefficients
Two-Tail Test: p-value = 2*P(T<-|t-score|)
V

-
Coefficient of Determination: R-Squared

23
One-Variable Linear Regression

, -3.4178 + 0.6351*AGST
!=
24
Estimate a linear model
(One Variable )
• Estimated model for price:
D
, -3.4178 + 0.6351*AGST
!=
• The predicted LogPrice increases by
-

-
0.6351 for every 1 degree increase in
-

average growing season temperature.

• If AGST = 15, then !=, ?
• If AGST = 18, then !=
, ?

O • If AGST = 20, then !=

, ?
i

25
T-Tests for the Coefficients: H0: bj = 0 versus HA: bj ≠ 0
& -

Two-Tail Test for the Slope

(Very important. Can you predict Y from X?)
H0: b1 = 0 versus HA: b1 ≠ 0
• t-score = (coefficient – 0)/(std.error)
-
• t-score = (0.6351-0)/0.1509 = 4.208
• p_value = 2*P(T < -|4.208|)

↓- =2*t.dist(-4.208, 23, 1) < .001

• df = n-1-#IV = df Error under Sum of squares
-

O ⑧
e⑪
men
-

df = 23

⑰
-

& A
n
-

1 -

#IF

I
st ↑

I
25 1 1 23
=

- -
=

4 208
. ,
of =

23 .

0 .
001 <P rate -
<0 .

01 **

P([s> 200) xP(ts -3 76) 0 001

4 2 <
.

<
2 x
-
.
.

26
How well the model fits data
§ The simplest commonly used measure of fit is R# (the coefficient of
determination): R# = 1 − SSE/SST
-
-

– SSE = ∑&$%" y$ − y. $ ' : sum of squared errors - *

q Variation of Y that cannot be explained by the regression R 1

– SST = ∑&$%" y$ − y0 ' : total sum of squares v
1-
e

q Total amount of variation of Y around its mean

q “Error” generated by a baseline model without any inputs
-

– Decomposition of variation of Y:
SSF

&
SSE =

∑'%&" !% − !) # = ∑'%&" !% − !,% # + ∑'%&" !,% − !) #

7
q

-
-> Total variation Unexplained variation Explained variation
1- SE
s =
-

SSE *
SS7

R# = Proportion of the variance in DV is explained by the

regression model.

27
Coefficient of Determination: R-Squared
• R-Squared is a measure of fit
• Bigger R-Squared indicates better fit all
else being equal
• 43.5% of the variation of prices is
explained by the simple regression on
AGST

-
• 0 < R-Squared < 1

28
Use each variable on its own
§ R# =0.44 using Average growing season temperature (Variable
-
Significant, 0.001)
R# =0.32 using Harvest rain (Variable Significant, 0.01)

↓
§
§ R# =0.22 using France Population (Variable Significant, 0.05)
§ R# =0.20 using Age (Variable Significant, 0.05)
§ R# =0.02 using Winter rain (Not Significant)
§ Multivariate linear regression allows us to use more than one
variable to potentially improve our predictive ability.

Quality Questions
75% (16)
Quality Questions
26 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Plates and Screws: An Overview: Presented by DR Oteki Misiani
100% (1)
Plates and Screws: An Overview: Presented by DR Oteki Misiani
45 pages
MOU (00) - Introduction L
100% (2)
MOU (00) - Introduction L
37 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Experiment No: 4: Aim: Purpose
No ratings yet
Experiment No: 4: Aim: Purpose
3 pages
Work Sampling
100% (1)
Work Sampling
69 pages
Simple Regression
100% (1)
Simple Regression
50 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Unit-1 Lesson 1
No ratings yet
Unit-1 Lesson 1
10 pages
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
No ratings yet
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
21 pages
Estad Istica II Chapter 4: Simple Linear Regression
No ratings yet
Estad Istica II Chapter 4: Simple Linear Regression
46 pages
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
No ratings yet
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
21 pages
Stats1 Chapter 2::: Measures of Location & Spread
No ratings yet
Stats1 Chapter 2::: Measures of Location & Spread
53 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
RM BV Manual PDF
No ratings yet
RM BV Manual PDF
9 pages
Time Series Montg Notes
No ratings yet
Time Series Montg Notes
7 pages
2023 Statistics Fin 11
No ratings yet
2023 Statistics Fin 11
19 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Chapter 17
No ratings yet
Chapter 17
31 pages
Module 3 - MultipleLinearRegression - Afterclass1b
No ratings yet
Module 3 - MultipleLinearRegression - Afterclass1b
34 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Kubota Mobile Light Tower
No ratings yet
Kubota Mobile Light Tower
1 page
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Stat 302 Lec 12
No ratings yet
Stat 302 Lec 12
59 pages
Uttam Linear Regression 17march24
No ratings yet
Uttam Linear Regression 17march24
82 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Simple Lin Regress Inference
No ratings yet
Simple Lin Regress Inference
51 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
Residual Analysis For Simple Linear Regression: X B B y N e N e
No ratings yet
Residual Analysis For Simple Linear Regression: X B B y N e N e
15 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
COMM5005 Lecture 8
No ratings yet
COMM5005 Lecture 8
54 pages
Mimo Introduction
No ratings yet
Mimo Introduction
13 pages
Regression
No ratings yet
Regression
56 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Simple Linear Regression1
No ratings yet
Simple Linear Regression1
36 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
15.simple Linear Regression-530
No ratings yet
15.simple Linear Regression-530
54 pages
Chapter 2 Simple Linear Regression - Jan2023
No ratings yet
Chapter 2 Simple Linear Regression - Jan2023
66 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
50 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
ch12 0
No ratings yet
ch12 0
43 pages
292322356
No ratings yet
292322356
69 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
Eonometrics For Acct and Finance CH 3 2023
No ratings yet
Eonometrics For Acct and Finance CH 3 2023
12 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Dod STD 2183
No ratings yet
Dod STD 2183
19 pages
Claude Shannon Masters Thesis
100% (3)
Claude Shannon Masters Thesis
7 pages
Chapter 7 - New 1
No ratings yet
Chapter 7 - New 1
29 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
C6 Regression
No ratings yet
C6 Regression
27 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Physics Final Project
No ratings yet
Physics Final Project
22 pages
Steldeck Slab Design
No ratings yet
Steldeck Slab Design
18 pages
Heliax AVA5-50 Coaxial Cable: One Company. A World of Solutions
No ratings yet
Heliax AVA5-50 Coaxial Cable: One Company. A World of Solutions
2 pages
JHC Common Entrance Exam Cee For Fy Ug Self Financing Programmes 2024 25
No ratings yet
JHC Common Entrance Exam Cee For Fy Ug Self Financing Programmes 2024 25
20 pages
Varian Catalog GPC-SEC
No ratings yet
Varian Catalog GPC-SEC
40 pages
Aerobic Respiration Worksheet
No ratings yet
Aerobic Respiration Worksheet
2 pages
Arholwr Yn Unig: Examiner Only
No ratings yet
Arholwr Yn Unig: Examiner Only
4 pages
ENGR 2530 Syllabus-Spring 2015 - KLM Abbreviated
No ratings yet
ENGR 2530 Syllabus-Spring 2015 - KLM Abbreviated
2 pages
Get Finite Element Design of Concrete Structures 2nd Ed Edition G. A. Rombach PDF Ebook With Full Chapters Now
100% (9)
Get Finite Element Design of Concrete Structures 2nd Ed Edition G. A. Rombach PDF Ebook With Full Chapters Now
85 pages
EN UserGuideISAKMetry
No ratings yet
EN UserGuideISAKMetry
32 pages
8051 Instruction Set
No ratings yet
8051 Instruction Set
50 pages
Ensayos de Permeabilidad
No ratings yet
Ensayos de Permeabilidad
27 pages
ER To Relational Model
No ratings yet
ER To Relational Model
39 pages
معاينة جبس
No ratings yet
معاينة جبس
21 pages
Software Design and Architecture: Week 4 A Case Study: Designing A Document Editor - Lexi
No ratings yet
Software Design and Architecture: Week 4 A Case Study: Designing A Document Editor - Lexi
38 pages
Dr. Carlos S. Lanting College: Basic Education - Senior High School
No ratings yet
Dr. Carlos S. Lanting College: Basic Education - Senior High School
8 pages
ch05 과제
No ratings yet
ch05 과제
2 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet

Module 3 - SimpleLinearRegression - Afterclass1b

Uploaded by

Module 3 - SimpleLinearRegression - Afterclass1b

Uploaded by

IIMT 2641 Introduction to Business Analytics

Module 3: Linear Regression

§ Large differences in price and quality between years, although wine is

§ March 1990 - Orley Ashenfelter, a Princeton economics professor, claims

§ Ashenfelter used a method called linear regression

Quick Question: What is the relationship between harvest rain, average

§ y is the dependent variable (DV) Bl

§ x is the independent variable (IV) Pr

§ !! is the y-intercept (value of E[Y|0] when x=0)

§ Random errors e (not required)

§ Random errors are a random sample from $ 0, '#

Each observation has a random error

§ The estimated or predicted value of y given x is:

§ '" is the sample estimate of the population slope #"

'! and '" are sample statistics

§ What is the predicted y when x = 1?

§ What is the observed y when x = 4?

§ What is the predicted y when x = 4?

§ What is the predicted y when x = 1?

§ What is the observed y when x = 4?

§ What is the predicted y when x = 4?

§ Residuals are the difference between the observed values of ! and

§ What is the residual r3 at x = 3?

§ What is the residual r2 at x = 2?

§ What is the residual r3 at x = 3?

B'" ) sample average of dependent variable

Do not need to memorize.

average growing season temperature.

O • If AGST = 20, then !=

Two-Tail Test for the Slope

↓- =2*t.dist(-4.208, 23, 1) < .001

P([s> 200) xP(ts -3 76) 0 001

– SSE = ∑&$%" y$ − y. $ ' : sum of squared errors - *

q Variation of Y that cannot be explained by the regression R 1

q Total amount of variation of Y around its mean

∑'%&" !% − !) # = ∑'%&" !% − !,% # + ∑'%&" !,% − !) #

R# = Proportion of the variance in DV is explained by the

You might also like