0% found this document useful (0 votes)

26 views41 pages

STAT22209 - Chapter 03-Multiple Regression - 2022

Uploaded by

Hasitha Dhananjaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views41 pages

STAT22209 - Chapter 03-Multiple Regression - 2022

Uploaded by

Hasitha Dhananjaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Advanced Statistics II

( PST22209/ FST 22209/ ESNRM22209)

R.M. KAPILA RATHNAYAKA

B.Sc. Special (Math. & Stat. ) (Ruhuna), M.Sc. (Industrial Mathematics) (USJ),
M.Sc. (Stat. ) (WHUT, China),
Ph.D. (Applied Statistics, WHUT)
Why we need alternative method than
Linear Regression…….
Polynomial Regression
• In situations where the functional relationship between the
response Y and the independent variable x cannot be
adequately approximated by a linear relationship, it is
sometimes possible to obtain a reasonable fit by considering a
polynomial relationship.

• where are regression coefficients that would have to be estimated

• h is called the degree of the polynomial.

• To determine these estimators, we take partial derivatives
with respect to of the foregoing sum of squares, and then
set these equal to 0 so as to determine the minimizing
values.

• On doing so, and then rearranging the resulting equations,

we obtain that the least square estimators satisfy the
following set of linear equations called the normal
equations.
Degree of the polynomial

• where h is called the degree of the polynomial. For

lower degrees, the relationship has a specific names.
• h = 2 is called quadratic
• h = 3 is called cubic,
• h = 4 is called quartic, and so on.
Second-degree Polynomial – quadratic Trend
• Practically, most of the real world data patterns are best described by

curves, not straight lines. In these instances, the linear trend model does

not adequately describe the change in the variable as time changers.

• To overcome this problem, we often use a parabolic curve, which is

described by mathematically by a second-degree equation.

• The general form for an estimated second-degree equation is;

• Where;

• estimate of the dependent variable

• numerical constants
• However, we can determine the values of the numerical constants
from the following three equations.
Second-degree Polynomial – quadratic Trend
Applications
• Fit a polynomial to the following data.
X Y
0 0
1 0
2 2
3 6
4 12
• However, we can determine the values of the
numerical constants from the following three
equations. X Y
0 0
1 0
2 2
3 6
4 12
Example
• Fit a polynomial to the following data.
quadratic Trend
• However, we can determine the values of the numerical constants
from the following three equations.
• the estimated quadratic regression equation I

• The estimated quadratic regression equation is

Matrix notation to solve equation system

• which has the solution

which has the solution

Example 2
• You are studying the relationship between a particular
machine setting and the amount of energy consumed.
• log transformation of the response variable will produce a
more symmetric error distribution.
Multiple Linear Regression

• Multiple regression is an extension of simple linear

regression.

• It is used when we want to predict the value of a variable

based on the value of two or more other variables.

• Suppose that we have a linear model

Example
• you could use multiple regression to understand whether
exam performance can be predicted based on
– revision time,
– test anxiety,
– lecture attendance
– gender.
• Alternately, you could use multiple regression to understand
whether daily cigarette consumption can be predicted based
on
– smoking duration,
– age when started smoking,
– smoker type,
– income
– gender.
Assumption #1:

• Dependent variable should be measured on a continuous

scale (i.e., it is either an interval or ratio variable)

• Example:
– revision time (measured in hours),
– intelligence (measured using IQ score),
– exam performance (measured from 0 to 100),
– weight (measured in kg)
Assumption #2:

• Two or more independent variables, which can be either

continuous (i.e., an interval or ratio variable) or categorical
(i.e., an ordinal or nominal variable).

• Examples of nominal variables include ;

– gender (male and female),

– ethnicity (Caucasian, African American and Hispanic),

– physical activity level (sedentary, low, moderate and high),

– profession (surgeon, doctor, nurse, dentist, therapist),

Numerical Data (Data that is Numbers) :
Continuous Random Variables

• Continuous Variable –

Continuous variables is a variable whose value is obtained

by measuring.
height of students in class
weight of students in class
 time it takes to get to school
distance traveled between classes
Numerical Data (Data that is Numbers) :
Discrete Random Variables
• A discrete variable is a variable whose value is obtained by
counting.

• All continuous variables are numeric, but not all numeric

variables are continuous.

• Examples:
– number of students present
– number of red marbles in a jar
– number of heads when flipping three coins
– students’ grade level
Categorical Data (Data that is not
numbers) : Nominal Variable
• Sometimes there is no hierarchy in categorical data.
• If eye colour was coded
– 0-- “Blue”
– 1 --“Green”
– 2 --“Brown”

we have to randomly choose which option gets which

number.
• It doesn’t matter whether Blue eyes is zero, or one, or two,
because there is no hierarchy in eye colour.
Categorical Data (Data that is not
numbers) : Ordinal Variable
• Annoying surveys often ask you to answer with the options
“Strongly Disagree”, “Disagree”, “Neutral”, “Agree” or
“Strongly agree”.
• This data has a special structure, because if these are coded 0
“Strongly Disagree” to 4 “Strongly agree”;
– 0 = Strongly Disagree
– 1 = Disagree
– 2 = Neutral
– 3 = Agree
– 4 = Strongly agree
Assumption #3:
• Your data needs to show homoscedasticity, which is where the
variances along the line of best fit remain similar as you move
along the line.

Assumption #4:
• Data must not show multicollinearity, which occurs when
you have two or more independent variables that are highly
correlated with each other.
What is Multicollinearity?
The following data on 20 individuals with high blood pressure:
1. blood pressure (y = BP, in mm Hg)

2. age (x1 = Age, in years)

3. weight (x2 = Weight, in kg)

4. body surface area (x3 = BSA, in sq m)

5. duration of hypertension (x4 = Dur, in years)

6. basal pulse (x5 = Pulse, in beats per minute)

7. stress index (x6 = Stress)

BP Age Weight BSA Dur Pulse
Age 0.659
Weight 0.950 0.407
BSA 0.866 0.378 0.875
Dur 0.293 0.344 0.201 0.131
Pulse 0.721 0.619 0.659 0.465 0.402
Stress 0.164 0.368 0.034 0.018 0.312 0.506

• Cell Contents: Pearson correlation

• Blood pressure appears to be related fairly strongly to Weight (r = 0.950)
and BSA (r = 0.866), and hardly related at all to Stress level (r = 0.164).
• Weight and BSA appear to be strongly related (r = 0.875)

• The high correlation among some of the predictors suggests that data-
based multicollinearity exists.
Assumption #5:
• There should be

– no significant outliers,
– high leverage points
– highly influential points.
• These different classifications of unusual points reflect the different
impact they have on the regression line.
What are outliers in the data?
• An outlier is an observation that lies an abnormal distance from other values
in a random sample from a population.

• The box plot is a useful graphical display for describing the behavior of the
data in the middle as well as at the ends of the distributions.

• The following quantities (called fences) are needed for identifying extreme
values in the tails of the distribution:
– lower inner fence: Q1 - 1.5*IQ
– upper inner fence: Q3 + 1.5*IQ
– lower outer fence: Q1 - 3*IQ
– upper outer fence: Q3 + 3*IQ

• A point beyond an inner fence on either side is considered a mild outlier. A

point beyond an outer fence is considered an extreme outlier.
Assumption #5:
• You should have independence of observations (i.e., independence of

residuals), which you can easily check using the Durbin-Watson statistic

Assumption #6:
• There needs to be a linear relationship between
– the dependent variable and each of your independent
variables
Assumption #7:
• Finally, you need to check that the residuals (errors) are
approximately normally distributed

• Two common methods to check this assumption include using:

– histogram (with a superimposed normal curve) and a
Normal P-P Plot;
– Normal Q-Q Plot of the studentized residuals.
Multiple Linear Regression

• Suppose that we have a linear model

And we make independent observations , on .

• We can write the observation as

• where is the setting of the independent variable for the

observation, .
,

• We now define the following matrices, with :

•
• , , ,
• Thus, the equations representing as a function of the ’s, ’s,
and ’s can be simultaneously written as
Regression with Two Independent Variables
• For observations from a simple linear regression model of the form
,
, , ,
• The least-squares equations for and were given in the previous
section as
Regression with Two Independent Variables
• Assume the model production function below,

.
• Where is total production, is labor input, is total capital and
the information about each factor is given below for the 15
year period of 2001 to 2016.

Year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

20 35 30 47 60 68 76 90 100 105 130 140 125 120 135

10 15 21 26 40 37 42 33 30 38 60 65 50 35 42

12 10 9 8 5 7 4 5 7 5 3 4 3 1 2
• By using above data, estimate the and
parameters of

by using the ordinary least square (OLS) method.

Conceptual Physics 1st Edition Unlocked Test Bank
0% (1)
Conceptual Physics 1st Edition Unlocked Test Bank
314 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Notification 20 2017
No ratings yet
Notification 20 2017
80 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
New Section 1
No ratings yet
New Section 1
39 pages
ML Notes
No ratings yet
ML Notes
38 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Week 6: Assumptions in Regression Analysis
No ratings yet
Week 6: Assumptions in Regression Analysis
69 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
WST 311 Notes Part 2 2024
No ratings yet
WST 311 Notes Part 2 2024
21 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
AMA3602Final2024Fall Ray
No ratings yet
AMA3602Final2024Fall Ray
21 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Unit III Regression
No ratings yet
Unit III Regression
24 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
AEF5 WB Answer Key
No ratings yet
AEF5 WB Answer Key
8 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Module 4
No ratings yet
Module 4
33 pages
Econometric Estimation BETA
No ratings yet
Econometric Estimation BETA
36 pages
Oversikt ECN402
No ratings yet
Oversikt ECN402
40 pages
Unit 3
No ratings yet
Unit 3
24 pages
Financial Statistics - Formula Sheet
No ratings yet
Financial Statistics - Formula Sheet
26 pages
Linear Regression
100% (2)
Linear Regression
28 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
Project - 4e - TB1 - CZ Jóóóó
80% (5)
Project - 4e - TB1 - CZ Jóóóó
127 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
Notes 2
No ratings yet
Notes 2
16 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Negotiation Strategies
No ratings yet
Negotiation Strategies
85 pages
Lecture Notes - Econometrics I - Andrea Weber
No ratings yet
Lecture Notes - Econometrics I - Andrea Weber
119 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Chapter 6 (Part Ii)
No ratings yet
Chapter 6 (Part Ii)
41 pages
Grade 8 Students Perception of Their Academic Performance When or If Using Mobile Phones and Notebooks in Note-Taking Survey-Report-1-1
No ratings yet
Grade 8 Students Perception of Their Academic Performance When or If Using Mobile Phones and Notebooks in Note-Taking Survey-Report-1-1
46 pages
MBAA 551 Applied Business Project
No ratings yet
MBAA 551 Applied Business Project
60 pages
Econometrics
No ratings yet
Econometrics
13 pages
Donald Trump Horoscope
No ratings yet
Donald Trump Horoscope
47 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Example How To Perform Multiple Regression Analysis Using SPSS Statistics
100% (1)
Example How To Perform Multiple Regression Analysis Using SPSS Statistics
14 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Chapter 2 Regression Analysis Notes
No ratings yet
Chapter 2 Regression Analysis Notes
11 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
02-Process Analysis I
No ratings yet
02-Process Analysis I
32 pages
Msat Ulster Information Booklet
No ratings yet
Msat Ulster Information Booklet
27 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Lecture 10
No ratings yet
Lecture 10
5 pages
Makintosh Probe Test
No ratings yet
Makintosh Probe Test
8 pages
Regression
No ratings yet
Regression
9 pages
6: Regression and Multiple Regression: Independent Variable. Then, Click
No ratings yet
6: Regression and Multiple Regression: Independent Variable. Then, Click
9 pages
6: Regression and Multiple Regression: Independent Variable. Then, Click
No ratings yet
6: Regression and Multiple Regression: Independent Variable. Then, Click
9 pages
ANSYS Tutorial
No ratings yet
ANSYS Tutorial
16 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
6: Regression and Multiple Regression: Independent Variable. Then, Click
No ratings yet
6: Regression and Multiple Regression: Independent Variable. Then, Click
9 pages
Design, Development and Evaluation of A Mono Wheel Operated Sprayer Cum Weeder
No ratings yet
Design, Development and Evaluation of A Mono Wheel Operated Sprayer Cum Weeder
16 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Certified Reference Material - Certificate of Analysis: Ketamine, Primary Measurement Standard
No ratings yet
Certified Reference Material - Certificate of Analysis: Ketamine, Primary Measurement Standard
9 pages
BAB 7 Multiple Regression and Other Extensions of The Simple
No ratings yet
BAB 7 Multiple Regression and Other Extensions of The Simple
17 pages
Max 3466
No ratings yet
Max 3466
13 pages
Holiday Prim Assign Validation
No ratings yet
Holiday Prim Assign Validation
4 pages
What Is Simple Linear Regression?
No ratings yet
What Is Simple Linear Regression?
7 pages
Diby Diallo - Unit 5 Lesson 5
No ratings yet
Diby Diallo - Unit 5 Lesson 5
8 pages
Development of Activity Based Costing in Fabrication Company A Case Study 1
No ratings yet
Development of Activity Based Costing in Fabrication Company A Case Study 1
9 pages
Econometrics: Damodar Gujarati
No ratings yet
Econometrics: Damodar Gujarati
36 pages
Regression 101
No ratings yet
Regression 101
18 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
7-Multiple Regression
No ratings yet
7-Multiple Regression
17 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Lubcon Company Profile
No ratings yet
Lubcon Company Profile
9 pages
English Las Week 1 English 9 q4
No ratings yet
English Las Week 1 English 9 q4
8 pages
Midterm Hge Probs
No ratings yet
Midterm Hge Probs
2 pages
Activities Guide and Evaluation Rubric - Unit 3 - Task 4 - Oral Production - Voices in Motion
No ratings yet
Activities Guide and Evaluation Rubric - Unit 3 - Task 4 - Oral Production - Voices in Motion
7 pages
Deped Tayo Layout Monitoring
No ratings yet
Deped Tayo Layout Monitoring
4 pages
Urogenitalia English: CSL I CSL Ii
No ratings yet
Urogenitalia English: CSL I CSL Ii
3 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Edge 2E 3BU5 Advanced Reading AK
No ratings yet
Edge 2E 3BU5 Advanced Reading AK
4 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
Pppooolll - Internship Program (Architecture) 2021
No ratings yet
Pppooolll - Internship Program (Architecture) 2021
3 pages
Idoc - Pub Carta Psicrometrica Carrier
No ratings yet
Idoc - Pub Carta Psicrometrica Carrier
1 page
Shrink Disc Type HSD: Mounting and Removal Instructions For
No ratings yet
Shrink Disc Type HSD: Mounting and Removal Instructions For
1 page
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Moving Man - Velocity vs. Time Graphs: Background - Lab Write-Up - Learning Goals Procedure
No ratings yet
Moving Man - Velocity vs. Time Graphs: Background - Lab Write-Up - Learning Goals Procedure
2 pages

STAT22209 - Chapter 03-Multiple Regression - 2022

Uploaded by

STAT22209 - Chapter 03-Multiple Regression - 2022

Uploaded by

Advanced Statistics II

( PST22209/ FST 22209/ ESNRM22209)

R.M. KAPILA RATHNAYAKA

• where are regression coefficients that would have to be estimated

• h is called the degree of the polynomial.

• On doing so, and then rearranging the resulting equations,

• where h is called the degree of the polynomial. For

not adequately describe the change in the variable as time changers.

• To overcome this problem, we often use a parabolic curve, which is

described by mathematically by a second-degree equation.

• The general form for an estimated second-degree equation is;

• estimate of the dependent variable

• The estimated quadratic regression equation is

• which has the solution

which has the solution

• Multiple regression is an extension of simple linear

• It is used when we want to predict the value of a variable

• Suppose that we have a linear model

• Dependent variable should be measured on a continuous

• Two or more independent variables, which can be either

• Examples of nominal variables include ;

– ethnicity (Caucasian, African American and Hispanic),

– physical activity level (sedentary, low, moderate and high),

– profession (surgeon, doctor, nurse, dentist, therapist),

Continuous variables is a variable whose value is obtained

• All continuous variables are numeric, but not all numeric

we have to randomly choose which option gets which

2. age (x1 = Age, in years)

3. weight (x2 = Weight, in kg)

4. body surface area (x3 = BSA, in sq m)

5. duration of hypertension (x4 = Dur, in years)

6. basal pulse (x5 = Pulse, in beats per minute)

7. stress index (x6 = Stress)

• Cell Contents: Pearson correlation

• A point beyond an inner fence on either side is considered a mild outlier. A

• Two common methods to check this assumption include using:

• Suppose that we have a linear model

And we make independent observations , on .

• We can write the observation as

• where is the setting of the independent variable for the

• We now define the following matrices, with :

20 35 30 47 60 68 76 90 100 105 130 140 125 120 135

by using the ordinary least square (OLS) method.

You might also like