0% found this document useful (0 votes)

28 views10 pages

Unit 5

Uploaded by

Uttareshwar Sontakke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views10 pages

Unit 5

Uploaded by

Uttareshwar Sontakke

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT 5 SIMPLE AND MULTIPLE REGRESSION

Objectives
After going through this unit, you will be able to:
 Explain Regression Analysis
 Explain Simple Regression
 Explain Multiple Regression
Structure
5.1 Introduction
5.2 Simple Regression Analysis
5.3 Multiple Regression Analysis
5.4 Assessing the Regression Equation
5.5 Key Words
5.6 Summary

5.1 INTRODUCTION

The most commonly used form of regression is linear regression, and the most common type of linear
regression is called ordinary least squares regression.

Linear regression uses the values from an existing data set consisting of measurements of the values
of two variables, X and Y, to develop a model that is useful for predicting the value of the dependent
variable, Y for given values of X.

5.2 SIMPLE REGRESSION ANALYSIS

The regression equation is written as Y = a + bX + e

Y is the value of the Dependent variable (Y), what is being predicted or explained
a or Alpha, a constant; equals the value of Y when the value of X=0
b or Beta, the coefficient of X; the slope of the regression line; how much Y changes for each one-unit
change in X.
X is the value of the Independent variable (X), what is predicting or explaining the value of Y e is the
error term; the error in predicting the value of Y, given the value of X (it is not displayed in most
regression equations).

For example, say we know what the average speed is of cars on the freeway when we have 2 highway
patrols deployed (average speed=75 mph) or 10 highway patrols deployed (average speed=35 mph).
But what will be the average speed of cars on the freeway when we deploy 5 highway patrols?

Average Speed on Freeway (Y) Number of Patrol Cars Deployed (X)

75 2
35 10

From our known data, we can use the regression formula (calculations not shown) to compute the
values of and obtain the following equation: Y= 85 + (-5) X, where

Y is the average speed of cars on the freeway

a=85, or the average speed when X=0
b= (-5), the impact on Y of each additional patrol car deployed
X is the number of patrol cars deployed

That is, the average speed of cars on the freeway when there are no highway patrols working (X=0)
will be 85 mph. For each additional highway patrol car working, the average speed will drop by 5 mph.
For five patrols (X=5), Y = 85 + (-5) (5) = 85 - 25 = 60 mph

There may be some variations on how regression equations are written in the literature. For example,
you may sometimes see the dependent variable term (Y) written with a little “hat” ( ^ ) on it, or called
Y-hat. This refers to the predicted value of Y. The plain Y refers to observed values of Y in the data set
used to calculate the regression equation.

You may see the symbols for alpha (a) and beta (b) written in Greek letters, or you may see them
written in English letters. The coefficient of the independent variable may have a subscript, as may
the term for X, for example, b1X1 (this is common in multiple regressions).

In theory, there are several important assumptions that must be satisfied if linear regression is to be
used. These are:
1. Both the independent (X) and the dependent (Y) variables are measured at the interval or
ratio level.
2. The relationship between the independent (X) and the dependent (Y) variables is linear.
3. Errors in prediction of the value of Y are distributed in a way that approaches the normal
curve.
4. Errors in prediction of the value of Y are all independent of one another.
5. The distribution of the errors in prediction of the value of Y is constant regardless of the value
of X.

There are a number of advanced statistical tests that can be used to examine whether or not these
assumptions are true for any given regression equation. However, these are beyond the scope of this
discussion.

5.3 MULTIPLE REGRESSION ANALYSIS

Consider a random sample of n observations (xi1, xi2, . . . . , xip, yi), i = 1, 2, . . . , n.

The p + 1, random variables are assumed to satisfy the linear model
yi =  0 +  1xi1 +  2xi2 , + pxip + ui i = 1, 2, . . . , n
Where ui are values of an unobserved error term, u, and. the unknown parameters are
constants.
Some assumptions of multiple regression analysis are:
 The error terms ui are mutually independent and identically distributed, with mean =
0 and constant variances

 E [ui] = 0 V [ui] =
 This is so, because the observations y1, y2, . . . ,yn are a random sample, they are
mutually independent and hence the error terms are also mutually independent
 The distribution of the error term is independent of the joint distribution of x i, x 2, . .
.,xp
 The unknown parameters h0, 1, 2, . . . ,p are constants.

Equations relating the n observations can be written as:

The parameters h0, 1, 2, . . . ,p can be estimated using the least squares procedure,
which minimizes the sum of squares of errors.

Minimizing the sum of squares leads to the following equations, from which the values
of b can be computed:

The problem of multiple regressions can be geometrically represented as follows. We can

visualize those n observations (xi1, xi2, …..xip, yi) i = 1, 2, ….n are represented as points in a
(p+1) - dimensional space. The regression problem is to determine the possible hyper-planes
in the p – dimensional space, which will be the best- fit. We use the least squares criterion and
locate the hyper-plane that minimizes the sum of squares of the errors, i.e., the distances from
the points around the plane (observations) and the point on the plane.

i.e. the estimate ŷ = a+b1x1+b2x2+…+ bpxp

Standard error of the estimate

Se =

Where yi = the sample value of the dependent variable

ŷi = corresponding value estimated from the regression equation

n = number observations

p = number of predictors or independent variable

The denominator of the equation indicates that in multiple regressions with p independent
variables, the standard error has n-p-1 degrees of freedom. This happens because the degrees
of freedom are reduced from n by p+1 numerical constants a, b1, b2, …..bp, that have been
estimated from the sample.
Fit of the regression model

The fit of the multiple regression models can be assessed by the Coefficient of Multiple
determination, which is a fraction that represents the proportion of total variation of y that is
explained by the regression plane.

Sum of squares due to error

SSE =

Sum of squares due to regression

SSR =

Total sum of squares

SST =

Obviously,

SST = SSR + SSE

The ratio SSR/SST represents the proportion of the total variation in y explained by the
regression model. This ratio, denoted by R2, is called the coefficient of multiple
determinations. R2 is sensitive to the magnitudes of n and p in small samples. If p is large
relative to n, the model tends to fit the data very well. In the extreme case, if n = p+1, the
model would exactly fit the data.

A better goodness of fit measure is the adjusted R2, which is computed as follows:

Adjusted R2= 1 – ( ) (1- R2)

=1-

Statistical inferences for the model

The overall goodness of fit of the regression model (i.e. whether the regression model is at all
helpful in predicting the values of y can be evaluated, using an F-test in the format of analysis
of variance.

Under the null hypothesis: Ho: β1 = β2 = ... = βp = 0, the statistic

has an F-distribution with p and n--1 degrees of freedom

ANOVA Table for Multiple Regression

Source of Sum of Degrees of Mean F ratio

Variation Squares freedom Squares

Regression SSR P MSR MSR/MSE

Error SSE (n-p-1) MSE

Total SST (n-1)

Whether a particular variable contributes significantly to the regression equation can be

tested as follows: For any specific variable xi, we can test the null hypothesis Ho: βi = 0, by
computing the statistic

And performing a one or two tailed t-test with n-p-1 degrees of freedom
Standardized regression coefficients
The magnitude of the regression coefficients depends upon the scales of measurement
used for the dependent variable y and the explanatory variables included in the regression
equation. Unstandardized regression coefficients cannot be compared directly because of
differing units of measurements and different variances of the x variables. It is therefore
necessary to standardize the variables for meaningful comparisons.
The estimated model

ŷi = bo+b1xi1+b2xi2+….bpxip can be written as:

The expressions in the parentheses are standardized variables; b’s; are unstandardized
regression coefficients and s1, s2, …sp are the standard deviations of variables x1, x2,
….xp and sx is the standard deviation of variable x.

The coefficients (bisi)/sy, j=1, 2,…, p are called standardized regression coefficients. The
standardized regression coefficient measures the impact of a unit change in the standardized
value of xi on the standardized value of y. The larger the magnitude of standardized bi, the
more xi contributes to the prediction of y. However, the regression equation itself should be
reported in terms of the unstandardized regression coefficients so that prediction of y can be
made directly from the x variables.

5.4 ASSESSING THE REGRESSION EQUATION

We now have a regression equation. But how good is the equation at predicting values of Y, for given
values of X? For that assessment, we turn to measures of association and measures of statistical
significance that are used with regression equations.

 r2: It is a measure of association; it represents the percent of the variance in the values of Y
that can be explained by knowing the value of X. It varies from a low of 0.0 (none of the
variance is explained), to a high of +1.0 (all of the variance is explained).
 Standard error of the computed value of b: A t-test for statistical significance of the coefficient
is conducted by dividing the value of b by its standard error. By rule of thumb, a t-value of
greater than 2.0 is usually statistically significant but you must consult a t-table to be sure. If
the t-value indicates that the b coefficient is statistically significant, this means that the
independent variable or X (number of patrol cars deployed) should be kept in the regression
equation, since it has a statistically significant relationship with the dependent variable or Y
(average speed in mph). If the relationship was not statistically significant, the value of the b
coefficient would be (statistically speaking) indistinguishable from zero.

F: It is a test for statistical significance of the regression equation as a whole. It is obtained by dividing
the explained variance by the unexplained variance. By rule of thumb, an F-value of greater than 4.0
is usually statistically significant but you must consult an F-table to be sure. If F is significant, than the
regression equation helps us to understand the relationship between X and Y.

5.5 KEYWORDS
 The coefficient of determination: It tells the amount of variability in one variable explained
by variability in the other variable.
 Linear relationship: From the definition of correlation as the degree of linear relationship
between two variables, we can use the correlation coefficient to compute the equations for
the straight lines best describing the relationship between the variables.
 Regression equations: The equations (one to predict X and one to predict Y) are called
regression equations, and we can use them to predict a score on one variable if we know a
score on the other.
 The least squares line: The general form of the equation is Y = bX + a, where ‘b’ is the slope
of the line and ‘a’ is where the line intercepts the Y axis. The regression line is also called the
least squares line.

5.6 SUMMARY
Most parametric models are “regression models.” Regression models require data sets from past
performance in order that a regression formula can be derived. The regression formula is used to
predict or forecast future performance. Thus, to employ parametric models they first must be
calibrated with history. Calibration requires some standardization of the definition of deliverable
items and item attributes. Once a calibrated model is in hand, to obtain estimates of deliverable is fed
with parameter data of the project being estimated. Model parameters are also set or adjusted to
account for similarity or dissimilarity between the project being estimated and the project history.
Usually, a methodology is incorporated into the model. Some models also allow for specification of
risk factors as well as the severity of those risks.

Cfa l2 2024 Volume1 1522872379
No ratings yet
Cfa l2 2024 Volume1 1522872379
30 pages
STAT 252-Notes-Topic 5-Multiple Linear Regression
No ratings yet
STAT 252-Notes-Topic 5-Multiple Linear Regression
33 pages
FinQuiz - Curriculum Note, @InsightSquad Study Session 2, Reading 5
No ratings yet
FinQuiz - Curriculum Note, @InsightSquad Study Session 2, Reading 5
11 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
Regression
No ratings yet
Regression
24 pages
Sma32
No ratings yet
Sma32
30 pages
Module 5: Multiple Regression Analysis: Tom Ilvento
No ratings yet
Module 5: Multiple Regression Analysis: Tom Ilvento
20 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
15 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Multiple Regression
No ratings yet
Multiple Regression
60 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
2024 Chapter 1
No ratings yet
2024 Chapter 1
8 pages
Lecture5 Mar22 2024
No ratings yet
Lecture5 Mar22 2024
44 pages
Topic 3 Multiple Regression Analysis Estimation
No ratings yet
Topic 3 Multiple Regression Analysis Estimation
31 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Lect 10 801T
No ratings yet
Lect 10 801T
17 pages
COMM5005 Lecture 8
No ratings yet
COMM5005 Lecture 8
54 pages
Multiple Linear Regression-I
No ratings yet
Multiple Linear Regression-I
6 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
No ratings yet
Week 5 Multiple Regression: Busa3500 Statistics For Business Ii Piedmont College
57 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
3.multiple Correlation & Regression
No ratings yet
3.multiple Correlation & Regression
24 pages
Part 8 Linear Regression
No ratings yet
Part 8 Linear Regression
6 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Chapter 11
No ratings yet
Chapter 11
18 pages
Bivariate
No ratings yet
Bivariate
28 pages
Multiple Regression Slides Mod-Ed
No ratings yet
Multiple Regression Slides Mod-Ed
32 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
73 pages
125.785 Module 2.2
No ratings yet
125.785 Module 2.2
95 pages
Multivar 2 - Simple and Multiple Regression PDF
No ratings yet
Multivar 2 - Simple and Multiple Regression PDF
26 pages
Complete Business Statistics: Multiple Regression
No ratings yet
Complete Business Statistics: Multiple Regression
64 pages
REGRESSION 4 18092024 110527am
No ratings yet
REGRESSION 4 18092024 110527am
32 pages
6 Multiple Regression
No ratings yet
6 Multiple Regression
36 pages
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
No ratings yet
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
26 pages
Yesim Ozan - Simple Linear Regression-Presentation - 08.08.15
No ratings yet
Yesim Ozan - Simple Linear Regression-Presentation - 08.08.15
19 pages
Regression Linear
No ratings yet
Regression Linear
24 pages
Simple and Multiple Regression Analysis
No ratings yet
Simple and Multiple Regression Analysis
46 pages
Lecture 6 Regression Analysis
No ratings yet
Lecture 6 Regression Analysis
35 pages
5 - Part II - Regression Analysis W-Notes
No ratings yet
5 - Part II - Regression Analysis W-Notes
10 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
FRM Part 1: Regression With Multiple Explanatory Variables
No ratings yet
FRM Part 1: Regression With Multiple Explanatory Variables
29 pages
Simple and Multiple Regression
No ratings yet
Simple and Multiple Regression
56 pages
Multiple Regression (Compatibility Mode)
No ratings yet
Multiple Regression (Compatibility Mode)
24 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
17 pages
Regression
No ratings yet
Regression
14 pages
Multiple Regression A
No ratings yet
Multiple Regression A
32 pages
CHAPTER 15 Partial and Multiple Correlation and Regression Analysis
100% (2)
CHAPTER 15 Partial and Multiple Correlation and Regression Analysis
48 pages
2b Multiple Linear Regression
No ratings yet
2b Multiple Linear Regression
14 pages
Applied Business Forecasting and Planning: Multiple Regression Analysis
No ratings yet
Applied Business Forecasting and Planning: Multiple Regression Analysis
100 pages
4 Multiple Regression Analysis
No ratings yet
4 Multiple Regression Analysis
58 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
CFA L2 2024 Volume1
100% (1)
CFA L2 2024 Volume1
168 pages
Note 13 - Linear Regression
No ratings yet
Note 13 - Linear Regression
25 pages
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Scope of Work Hazardous Building Material Survey and Inventory
No ratings yet
Scope of Work Hazardous Building Material Survey and Inventory
4 pages
Lampiran 2. Pendugaan Model Awal
No ratings yet
Lampiran 2. Pendugaan Model Awal
20 pages
Seismic Design of Foundations PDF
100% (1)
Seismic Design of Foundations PDF
111 pages
Module 4
No ratings yet
Module 4
30 pages
The Effects of Peer Influence To Students Decision-Making Towards Academic Engagement
100% (1)
The Effects of Peer Influence To Students Decision-Making Towards Academic Engagement
56 pages
English 10 q4 m4 Research-Report
No ratings yet
English 10 q4 m4 Research-Report
16 pages
Nonlinear Analysis of Reinforced Concrete Slabs at Elevated Temperature
No ratings yet
Nonlinear Analysis of Reinforced Concrete Slabs at Elevated Temperature
10 pages
Personality and Self Reported Mobile Phone Use
No ratings yet
Personality and Self Reported Mobile Phone Use
15 pages
Notes On Potential Energy Methods: Buckling: Based On The Original Prepared by Dr. T. Koyama, Spring 2009
No ratings yet
Notes On Potential Energy Methods: Buckling: Based On The Original Prepared by Dr. T. Koyama, Spring 2009
19 pages
Civil Iii Surveying I 10CV34 Notes PDF
No ratings yet
Civil Iii Surveying I 10CV34 Notes PDF
105 pages
Pediatric Health-Related Quality of Life Measurement Technology: A Guide For Health Care Decision Makers
No ratings yet
Pediatric Health-Related Quality of Life Measurement Technology: A Guide For Health Care Decision Makers
8 pages
A Unification of Models For Meta-Analysis of Diagnostic Accuracy Studies
No ratings yet
A Unification of Models For Meta-Analysis of Diagnostic Accuracy Studies
13 pages
EFQM-An Interrogative Review and Reserach Agenda
No ratings yet
EFQM-An Interrogative Review and Reserach Agenda
19 pages
2018-19 HBS PG STUDENT Dissertation Handbook - Final H Wells For PRINT
No ratings yet
2018-19 HBS PG STUDENT Dissertation Handbook - Final H Wells For PRINT
67 pages
Impact of Organisational Behaviour On Construction Projects: Mangala Mahanthan J - BEM 501 - Thesis 2012
No ratings yet
Impact of Organisational Behaviour On Construction Projects: Mangala Mahanthan J - BEM 501 - Thesis 2012
16 pages
AssignmentFile 141 20032024143946
No ratings yet
AssignmentFile 141 20032024143946
1 page
Capstone Project Research Methodology: by DR Nilesh Limbore
No ratings yet
Capstone Project Research Methodology: by DR Nilesh Limbore
45 pages
Improve Employee Performance Through
No ratings yet
Improve Employee Performance Through
8 pages
Debra Glick Et Al. - A Preliminary Investigation of The Role of Psychological Inflexibility in Academic Procastination
No ratings yet
Debra Glick Et Al. - A Preliminary Investigation of The Role of Psychological Inflexibility in Academic Procastination
38 pages
K-GSADS-A - Package Posible Prueba para Medir SOCIAL PHOBIA
No ratings yet
K-GSADS-A - Package Posible Prueba para Medir SOCIAL PHOBIA
16 pages
Checklist s2
No ratings yet
Checklist s2
3 pages
Eddy Current Testing
100% (3)
Eddy Current Testing
73 pages
High-Performing Boards Whats On Their Agenda PDF
No ratings yet
High-Performing Boards Whats On Their Agenda PDF
5 pages
Summative Test Chapter 1
No ratings yet
Summative Test Chapter 1
5 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
86 pages
Curriculum Implementation Guide For Teachers Session 2023-24 Dated 16 Jun 2023
No ratings yet
Curriculum Implementation Guide For Teachers Session 2023-24 Dated 16 Jun 2023
59 pages
Survey Disciplines Work Plan
No ratings yet
Survey Disciplines Work Plan
8 pages
Chapter 4 Stat
No ratings yet
Chapter 4 Stat
14 pages
Makeni Islamic Secondary School
No ratings yet
Makeni Islamic Secondary School
21 pages
Epidemiology
100% (6)
Epidemiology
58 pages

Unit 5

Uploaded by

Unit 5

Uploaded by

UNIT 5 SIMPLE AND MULTIPLE REGRESSION

5.2 SIMPLE REGRESSION ANALYSIS

The regression equation is written as Y = a + bX + e

Average Speed on Freeway (Y) Number of Patrol Cars Deployed (X)

Y is the average speed of cars on the freeway

5.3 MULTIPLE REGRESSION ANALYSIS

Consider a random sample of n observations (xi1, xi2, . . . . , xip, yi), i = 1, 2, . . . , n.

Equations relating the n observations can be written as:

The problem of multiple regressions can be geometrically represented as follows. We can

i.e. the estimate ŷ = a+b1x1+b2x2+…+ bpxp

Standard error of the estimate

Where yi = the sample value of the dependent variable

ŷi = corresponding value estimated from the regression equation

p = number of predictors or independent variable

Sum of squares due to error

Sum of squares due to regression

Total sum of squares

SST = SSR + SSE

Adjusted R2= 1 – ( ) (1- R2)

Statistical inferences for the model

Under the null hypothesis: Ho: β1 = β2 = ... = βp = 0, the statistic

has an F-distribution with p and n--1 degrees of freedom

ANOVA Table for Multiple Regression

Source of Sum of Degrees of Mean F ratio

Regression SSR P MSR MSR/MSE

Error SSE (n-p-1) MSE

Total SST (n-1)

Whether a particular variable contributes significantly to the regression equation can be

ŷi = bo+b1xi1+b2xi2+….bpxip can be written as:

5.4 ASSESSING THE REGRESSION EQUATION

You might also like