0% found this document useful (0 votes)

17 views32 pages

MLR Eda Model

Uploaded by

hanandeh0791

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views32 pages

MLR Eda Model

Uploaded by

hanandeh0791

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Multiple Linear

Regression - EDA and

Model
A Real Study

Background: cIMT is a measure of cardiovascular disease

Question: what is relationship between physical activity, cardiovascular fitness,
perceived functional ability, and cIMT
Conclusions: The most predictive variables of cIMT were: age (p = 0.000), gender (p =
0.001), BMI (p = 0.05), SBP (p = 0.000), total cholesterol (p = 0.000), and triglycerides (p =
0.000).

In this unit:

How do we simultaneously use many explanatory variables to understand and predict a

response?
Reminder
The process of statistical analysis:

1. Identify research question and the corresponding population and parameter you are
interested in.
2. Collect data.
3. Posit a statistical model based on information in the sample.
4. Draw inference about the population using your model.
Research Objective
Research Question: What determines a person’s height?
Population: All BYU students.
Parameter of Interest:

Some number measuring the “relationship” between height and various other
explanatory variables such as fathers height, mother’s height, etc.

Sample: A convenience sample of 1727 BYU students who are in Stat 121.
More Problem Definitions
Response Variable (y): The height of a student.

This is a continuous quantitative variable meaning it can be any number (including

decimals)

Explanatory Variable (x):

Lots! The goal is to relate multiple explanatory variables to a single quantitative

response variable.
Variable Encoding
(Part of) Your Student Survey Data
Height MotherHeight FatherHeight SportsInHS Sex ShoeSize
70 64 72 Yes Male 11.0
71 67 72 Yes Male 9.0
71 65 68 Yes Male 10.5
70 60 69 Yes Male 11.0
74 69 72 Yes Male 11.5
What do we do with the “Yes/No” variables?

Encoding - the process of assigning categorical variables numerical values.

One-hot-encoding (aka Dummy Variable encoding) - uses 1’s and 0’s.
Yes=1, No=0 or Female=0, Male=1 (alphabetical)
Much more on this in more stats classes but we’ll keep in simple here.
EDA Tool #1 - Plots
EDA Tool #2 - Correlations

Reminder on Properties of Correlation (r):

−1 < r < 1
Only appropriate for LINEAR relationships
NOT impacted by scale of data (scale invariant).
Highly impacted by outliers
Cor(X, Y ) = Cor(Y , X)
Using the Analysis Tool
Using the Analysis Tool
Using the Analysis Tool
Multiple Linear Regression Model
In specifying a model for the population relationship between height and all the
explanatory variables, we want to,

1. include all the variables at the same time,

2. keep it linear in all the variables at the same time,
3. account for the fact that the data is not a perfect relationship.
Multiple Linear Regression Model

where:

X 1i is the i th observation the of 1st explanatory variable

E.g X 13 the mother’s height for the 3rd observation in our dataset.
P = total number of explanatory variables you have
Multiple Linear Regression Model

How do we interpret β 0 , β 1 , … , β 5 (these are called slopes” or “eﬀects”)?

β 1 (MotherHeight): Holding everything else constant (or all else being equal), as the
height of the mother goes up by 1, we expect height to go up by β 1 on average.

β 2 (FatherHeight): Holding everything else constant (or all else being equal), as the
height of the father goes up by 1, we expect height to go up by β 2 on average.

β 3 (Sports): Holding everything else constant (or all else being equal), student’s who
play sports in high school are expected to be β 3 inches taller than those who didn’t.
Multiple Linear Regression Model

How do we interpret β 0 , β 1 , … , β 5 (these are called “slopes” or “eﬀects”)?

β 5 (shoe size): Holding everything else constant (or all else being equal), as shoe size
increases by 1, students get β 5 inches taller on average.

β 0 : Female student’s whose parents are 0 inches tall, did not play sports in HS and wear
a 0 shoe size, we expect their height to be β 0 on average.
Assumptions of the MLR Model
Easy way to remember what we are assuming about the population in a multiple linear
regression model:

L - Linear relationship between y and all the quantitative x’s simultaneously

I - Independence (one obs. doesn’t impact the other)
N - Normal residuals (distance from “line” is normal)
E - Equal spread of residuals around the “line”

More on why these assumptions are important and how to check these in the next
subunit.
Parameter Estimation
Parameters we want to estimate: β 0 & β 1 , … , β P (which defines the line) and σ (so we
know how spread out things are)
Goal: Find the predictions that goes “closest” to the data points.
Parameter Estimation
What do we mean by “line closest to points”? We want to find β^0 , β^1 , … , β^P so that:
n n
∑(Obs i − Pred i ) 2 = ∑(Y i − (β^0 + β^1 X 1i + β^2 X 2i + ⋯ + β^P X P i )) 2
i=1 i=1
n
= ∑(residual i ) 2
i=1

is as small as possible. This is called the least squares regression line.

A few notes:

1. We “square” distances to account for “above” and “below” the line distances.
2. We sum squared residuals because we look at all the data.
3. We use “hats” to denote estimates from sample (for example, β^1 is our estimate of β 1 )
4. We include all the explanatory variables simultaneously.
Parameter Estimation
How do we find β^0 , … , β^P that minimizes
n n
∑(Obs i − Line i ) 2 = ∑(Y i − (β^0 + β^1 X 1i + β^2 X 2i + ⋯ + β^P X P i )) 2
i=1 i=1
n
= ∑(residual i ) 2 ?
i=1

1. Guess and check

2. Use calculus

In both cases, we’ll let the computer do the hard work for us.
The Fitted MLR Model
Fitted MLR Model Output
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.2568815 1.2804735 18.162720 0.0000000
MotherHeight 0.2825800 0.0162393 17.400990 0.0000000
FatherHeight 0.2104869 0.0148010 14.221140 0.0000000
SportsInHSYes 0.3482241 0.1083163 3.214883 0.0013292
SexMale 3.1944662 0.1264956 25.253583 0.0000000
ShoeSize 1.0635041 0.0365141 29.125834 0.0000000
Fitted Regression Line Equation:

y^ = 23.26 + 0.28 × MotherHeight i + 0.21 × FatherHeight i + 0.35 × Sports i +

3.19 × Sex i + 1.06 × ShoeSize i
The Fitted MLR Model
How do we interpret β^0 = 23.257?

β 0 : For female children with 0 inch tall parents who do not play sports in HS and wear a
0 shoe size, we expect their height to be 27.28in on average.

How do we interpret β^3 = 0.348 (sports)?

All else being equal, students who play sports in high school are 0.348 inches taller, on
average.

How do we interpret β^5 = 1.064 (shoe size)?

Holding everything else constant (or all else being equal), as the shoe size goes up by 1,
we expect height to go up by 1.064 on average.
Using the Analysis Tool
Using the Analysis Tool

Fitted regression equation:

y^ = 349.2369 − 5.495 × Lat + 21.7976 × Ocean + 0.1219 × Long
Visualizing the Fitted MLR Model
When we only had 1 explanatory variable, we could visualize the fitted model:

But we can’t do that here because we have multiple explanatory variables that all work
together.
Visualizing the Fitted MLR Model
Added variable plots (also known as partial regression plots):

Intuition: Make a scatterplot of one x vs y AFTER “adjusting” for the other x’s (math
detail beyond this course so we’ll just let the computer do it for us).
Parameter Estimation
An estimate of σ is more complicated to explain (take more stats courses), so for purposes
of this class, the computer estimates it for us.

^ = 1.776
σ

How do we interpret σ
^?
On average, the actual heights are about 1.776 far away from the estimated heights.
Is this “better” or “worse” than if we just included mother’s height?
^ = 3.776 if we only use mother height.
σ
It’s hard to tell just from σ
^ how good a model is. A better measure is R 2 .
Assessing Model Fit
Mathematical formula:

∑ ni=1 (Y i − (β^0 + β^1 X 1i + ⋯ + β^P X P i )) 2

R2 = 1 − = 0.808
∑ ni=1 (Y i − ȳ) 2

Intuition:

Formal interpretation: The percent of variability in Y that is explained by all X’s

simultaneously.
R 2 is between 0 and 1 with 1 meaning the explanatory variables perfectly explain the
response.
R 2 is a percentage grade on how well all the X’s are doing in telling us about Y .
For our study, 80.8% of the variation in student’s height can be explained by mother’s
height, father’s height, if you played sports in HS, biological sex and shoe size.
Using the Analysis Tool
Additional MLR Practice
Measuring possum head size can be diﬀicult. What is the relationship between possum
head size and sex, age, skull width, total length and tail length? Use a multiple linear
regression model (and the course app) to answer the following questions:

1. What is the estimated head size for a newborn, female possum with 0 skull width,
length and tail length?
2. How much should head size go up (or down) as the possum gets 1 cm bigger?
3. How much are male head sizes bigger (or smaller) than female head sizes (on average)?
4. On average, how far away are true head sizes from estimated head sizes?
5. How well do the explanatory variables explain head size?
Additional MLR Practice
1. What is the estimated head size for a newborn, female possum with 0 skull width,
length and tail length?

β^0 = 33.4974481
2. How much should head size go up (or down) as the possum gets 1 cm bigger?

β^1 = 0.4528877
3. How much are male head sizes bigger (or smaller) than female head sizes (on average)?

β^2 = 1.1695384
4. On average, how far away are true head sizes from estimated head sizes?
This is the σ
^ = 2.080432
5. How well do the explanatory variables explain head size?
This is R 2 = 0.669
Homework Choices for Unit 7
Same as Unit 6 but we’re going to add more variables to the regression:

1. Rate my professor - what matters in determining a rate my professor score?

2. Supervisor - what makes people like their manager?
3. Body Fat - what body measurements are predictive of your BMI?
4. Basketball Salary - what skills lead to a higher salary?
Key Terminology
EDA for MLR Interpretation of Coeﬀicients
Multiple linear regression model Added-variable Plots
R2 Least squares estimation

Multiple Regression
No ratings yet
Multiple Regression
60 pages
Linear Regression Models
No ratings yet
Linear Regression Models
187 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
57 pages
ST T153A Regression Analysis
No ratings yet
ST T153A Regression Analysis
54 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
James Steiger R For MultipleRegressionIntro
No ratings yet
James Steiger R For MultipleRegressionIntro
54 pages
MLR Inference
No ratings yet
MLR Inference
39 pages
Lecture - 8 MLR
No ratings yet
Lecture - 8 MLR
63 pages
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
SLR Inference
No ratings yet
SLR Inference
33 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
SPSS and Building Models
No ratings yet
SPSS and Building Models
36 pages
Y Abx BX BX: Multiple Linear Regression
No ratings yet
Y Abx BX BX: Multiple Linear Regression
48 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Multiple Linear Regression (Multiple Regression Analysis)
No ratings yet
Multiple Linear Regression (Multiple Regression Analysis)
37 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
99 pages
Multiple Linear Regression 2021
No ratings yet
Multiple Linear Regression 2021
45 pages
Topic Planner - Modelling Associations
No ratings yet
Topic Planner - Modelling Associations
20 pages
Response
No ratings yet
Response
20 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
73 pages
Lec 5-MLR
No ratings yet
Lec 5-MLR
15 pages
MLR Prediction
No ratings yet
MLR Prediction
16 pages
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
No ratings yet
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
50 pages
Multiple Linear Regression: Points of Significance
No ratings yet
Multiple Linear Regression: Points of Significance
2 pages
Share MBBS Lecture 5 (1) - 1
No ratings yet
Share MBBS Lecture 5 (1) - 1
40 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Session 1.3 Notes
No ratings yet
Session 1.3 Notes
39 pages
9 Regression and Correlation Methods 5 2023
No ratings yet
9 Regression and Correlation Methods 5 2023
7 pages
Regression Part 5 Multiple and Hierarchical Regression Recorded PPT Lecture Slides
No ratings yet
Regression Part 5 Multiple and Hierarchical Regression Recorded PPT Lecture Slides
11 pages
06 Regression
No ratings yet
06 Regression
18 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
5
No ratings yet
5
23 pages
Or Graphics - Scatterplot Matrix: - Generate Male 1 - Replace Male 0 If Sex "'Female'"
No ratings yet
Or Graphics - Scatterplot Matrix: - Generate Male 1 - Replace Male 0 If Sex "'Female'"
7 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Section 10.1 - 2 - Shared Lab
No ratings yet
Section 10.1 - 2 - Shared Lab
5 pages
3.multiple Correlation & Regression
No ratings yet
3.multiple Correlation & Regression
24 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
18 SL Regression 1 320E F21
No ratings yet
18 SL Regression 1 320E F21
40 pages
Multiple Linear Regression-1
No ratings yet
Multiple Linear Regression-1
8 pages
EDUR 8131 Notes 8b Multiple Regression
No ratings yet
EDUR 8131 Notes 8b Multiple Regression
16 pages
Linear Regression
No ratings yet
Linear Regression
25 pages
Scatter Plot
No ratings yet
Scatter Plot
20 pages
OSHÚN
100% (1)
OSHÚN
10 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
18 pages
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
MA5120 - Introduction To Statistics - 228785T
No ratings yet
MA5120 - Introduction To Statistics - 228785T
3 pages
SPSS ANNOTATED OUTPUT Multiple Regression
No ratings yet
SPSS ANNOTATED OUTPUT Multiple Regression
12 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Company Profile: Pt. Rekayasa Energi Bersama
No ratings yet
Company Profile: Pt. Rekayasa Energi Bersama
35 pages
Caterpillar 3516b Marine Engine Operation Maintenance Manual 4bw
No ratings yet
Caterpillar 3516b Marine Engine Operation Maintenance Manual 4bw
32 pages
Amith Vayu Niyama
100% (1)
Amith Vayu Niyama
34 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
2 pages
2024 Chapter 1
No ratings yet
2024 Chapter 1
8 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
02 5G Xhaul Transport - BRKSPM-2012 BRKSPG-2680
100% (1)
02 5G Xhaul Transport - BRKSPM-2012 BRKSPG-2680
98 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
5700 13024 1 PB
No ratings yet
5700 13024 1 PB
12 pages
Allia IGS 3 Allia IGS 5 - MIS Maps - SM - 5871315-1EN - 3
No ratings yet
Allia IGS 3 Allia IGS 5 - MIS Maps - SM - 5871315-1EN - 3
2 pages
New Language Leader Intermediate Unit 12 Key
No ratings yet
New Language Leader Intermediate Unit 12 Key
4 pages
Quarterly Presentation On Training As Probationary Deputy Executive Engineer (Civil)
No ratings yet
Quarterly Presentation On Training As Probationary Deputy Executive Engineer (Civil)
22 pages
Junos Genius PDF
No ratings yet
Junos Genius PDF
12 pages
ACW Writing Roundabout Ebook
No ratings yet
ACW Writing Roundabout Ebook
23 pages
Workbook: Variable-Length Subnet Mask
No ratings yet
Workbook: Variable-Length Subnet Mask
29 pages
IBM Storage Interoperability Data 2020-09-10 08-35-43 PDF
No ratings yet
IBM Storage Interoperability Data 2020-09-10 08-35-43 PDF
2 pages
Worksheet 3
No ratings yet
Worksheet 3
2 pages
BTMC506 AppliedThermodynamicsmECH
No ratings yet
BTMC506 AppliedThermodynamicsmECH
2 pages
Sizing Current Transformers For Line Protection Applications
No ratings yet
Sizing Current Transformers For Line Protection Applications
1 page
AX90032-i002 - Yeti™ XL 1-8th Scale RTR
No ratings yet
AX90032-i002 - Yeti™ XL 1-8th Scale RTR
32 pages
Vigiflow: Introduction and Basic Features
No ratings yet
Vigiflow: Introduction and Basic Features
26 pages
Sales Manual Africa
No ratings yet
Sales Manual Africa
9 pages
2020form - MC28s2020-Annexes E - Accex
No ratings yet
2020form - MC28s2020-Annexes E - Accex
1 page
7 ICT Powerpoint W1
No ratings yet
7 ICT Powerpoint W1
3 pages
Amica Manual
No ratings yet
Amica Manual
44 pages
Predicate and Quantifiers
No ratings yet
Predicate and Quantifiers
8 pages
E-M-HG2-S-V2 Instruction Manual 011013
No ratings yet
E-M-HG2-S-V2 Instruction Manual 011013
55 pages
Grainger - Task 1
No ratings yet
Grainger - Task 1
5 pages
Prateek Resume
No ratings yet
Prateek Resume
1 page
Open Geodata Repositories & ISRO Geoweb Services For Thematic Applications by Shri. Kamal Pandey
No ratings yet
Open Geodata Repositories & ISRO Geoweb Services For Thematic Applications by Shri. Kamal Pandey
12 pages
Tutorial Bootstrap Part 3 - Cara Menginstall Bootstrap 5
No ratings yet
Tutorial Bootstrap Part 3 - Cara Menginstall Bootstrap 5
6 pages
Project Proposal
No ratings yet
Project Proposal
8 pages
EXCEL-Convert Number of Month To Name of Month
No ratings yet
EXCEL-Convert Number of Month To Name of Month
7 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet