0% found this document useful (0 votes)

5 views33 pages

1.1 Regression Analysis

The document provides an overview of regression analysis, focusing on its predictive capabilities and methods such as simple and multiple linear regression. It discusses estimating coefficients, assessing model fit, and testing relationships between variables, including the use of p-values and F-statistics. Additionally, it covers variable selection techniques and challenges associated with qualitative data and multicollinearity.

Uploaded by

svfernandohayleys

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views33 pages

1.1 Regression Analysis

Uploaded by

svfernandohayleys

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Regression Analysis

Dr. Chathura Rajapakse

Department of Industrial Management
Faculty of Science, University of Kelaniya
Regression Analysis – What it does

• Predicting numbers
• Supervised Learning
• Simple linear regression
• Multiple linear regression
• K-Nearest Neighbors
• Decision tree-based methods
• Artificial neural networks
Simple Linear Regression
• A very straightforward simple linear
approach for predicting a quantitative
response Y on the basis of a single predictor
variable X

? ? ?

• What do we intend to know?

How to estimate coefficients

• Measuring the closeness

• How?
• Minimizing the least squares
• Residual Sum of Squares (RSS)

Residual

• Can write as:

Estimating coefficients

• Least square approach uses minimizing the RSS

• Can conclude as:

Least square coefficient

estimates ?
The true regression line
• What we collect is data from a sample

• That gives estimates for that sample

• Estimates from the sample are different from estimates of the entire
population
• Analogy:
• The variance of sample mean from population mean

Square of standard
deviation of each of the
observations yi of Y
Standard error
Variance
The true regression line

Drawn from a normal

distribution
Standard errors in coefficients

• The standard errors of the coefficients of the simple regression line can
be computed as:

Unknown but can be

estimated from
Residual Standard
Error (RSE)
The confidence intervals

• A range of values such that with 95% probability, the range will contain
true unknown value of a parameter
• For ,

• There is an approximately 95% chance that the range;

• will contain its true value

• Similarly, for ;
Are X and Y related?

Why?

• The t-statistic
The number of standard
deviations is away from
zero
• Whether is sufficiently far away from 0 so that the true value of is
non-zero
Are X and Y related? The p-value
We estimate
two parameters
• A p-value computed from t-distribution
• If no relationship between X and Y, a t-distribution with n-2 degrees
of freedom
• N > 30 makes it normal distribution
• Compute the probability of any value equal to , assuming
• The p-value
• Provides the smallest level of significance at which the null-hypothesis
would be rejected
• Smaller p-value means an association between X and Y
• Typical cutoff values of p are 5% and 1%
Is there a relationship between TV
advertising and sales?
🠶 What would you infer?
How much the model fits data?

• Two methods to investigate

• Residual Standard Error (RSE)
• An estimation of the standard deviation of
• Would not be possible to accurately predict Y even if the true regression
line is known because of
• Roughly the average amount that the response will be deviate from the
regression line
• RSE formula

• What will happen to RSE if the deviation is high (lack of fit)

How much the model fits data?
• R2 statistic
• RSE is measured in the units of Y (So what?)
• Lack of standardization
• Can we build a measure between 0 and 1 (independent of Y’s scale)?

• Total Sum of Squares (TSS)

• measures the proportion of variability in Y that can be explained using X

• What is the best R2 value?
• Depends on from
where the data comes
Relationship with correlation measure

• In simple regression, r = Cor(X,Y) is a measure of association between X

and Y
• R2 can be considered equal to r2
Multiple Linear Regression

• What if there is more than one predictor?

• One simple linear regression model per predictor?
• Extending the simple linear regression to incorporate multiple predictors
Estimating the regression coefficients

• For unknown coefficients, we can use

estimates for prediction

• Using the same least squares approach

• The coefficients that minimize the RSS

are the least squares
Advertising example

• How do you interpret the below values

What does this slope

value mean?
• How can the following be interpreted?
Important questions

🠶 The important questions when doing multiple regression analysis are:

The relationship between response
and predictors
• Remember the simple linear regression context
• Similarly, the null and alternative hypothesis could be defined for
multiple linear regression

• This hypothesis test can be performed with F-statistic

The F-statistic

• Similar to the simple linear model:

So, if H0 is true, what

• If the linear model assumptions are correct:
can we say about F
value?

• If Ha is true what will happen to F value?

Interpreting the F-statistic
• The F statistic for the regression model of sales over TV, Radio, and
Newspaper advertisements

Way above 1.
At least one
advertising media is
related to sales

• What if n is larger?
• An F value slightly above 1 is sufficient to reject the null hypothesis
• What’s the right F value to reject the null hypothesis?
• The p-value from the F-distribution (when the errors are normally distributed)
• A smaller p-value suggests a relationship between the predictors and the
response
Can we test the null hypothesis for a
subset of coefficients?
• The corresponding null hypothesis for a subset of q coefficients:

The coefficients are

chosen from the end of
the list
• Can fit a second model to get the F-statistic

The new residual sum

of squares for that
subset
Deciding on important variables

• How many combinations can we make out of p predictors?

• Ex: 2 predictors
• {0,0}, {0,1}, {1,0}, {1,1} => 22

• Can test models with such combinations and choose the best model
• Methods used:
• Mallow’s Cp
• Akaike Information Criterion (AIC)
• Bayesian Information Criterion (BIC)
• Adjusted R2
• What would be the challenge here when combining variables?
Variable selection
• Automated and efficient methods to choose smaller yet effective
subsets of models (of subsets of variables)
• Common approaches:
• Forward selection
• Start with the null model (only intercept)
• Evaluate p simple linear regression models and add the variable with the lowest
RSS to the null model
• Evaluate the new set of two variable models and add the variable with the lowest
RSS to the model
• Continue till a particular stopping criteria is met
• Backward selection
• Start with all variables
• Remove the variable with highest p-value
• Re-evaluate the model, remove the variable with highest p-value and so on..
Variable selection

• Common approaches
• Mixed selection
• Start with a no-variable model
• Keep adding one by one like in the forward selection
• In case the p-value of any variable is above a certain threshold, remove that
variable
• Continue this back-and-forth process until a stopping criterion is met

• Can you evaluate the three approaches?

• Can we use the forward selection always?
• Can we use backward selection always?
• Why do we need a mixed selection approach?
Model Fit
What was it in simple
linear regression?

• Both RSE and R2 approaches can be used

• In multiple regression R2 turns out to be
• What can happen to R2 when a new variable is added to the model?
• If there is a significant increase?
• If the increment is insignificant?
• Can plotting be helpful to evaluate fitness?
• Visual summaries reveal problems that are not obvious in numerical statistics
Making predictions

• The model, once fitted to training data can be used for predictions
• Challenges
• Coefficient estimates are distant from actual population parameter values
• Least square plane vs true population regression plane

• Assuming a linear model is only an approximation of reality – a biasness from

a reducible error
• The impact of irreducible error even if we know the true f(x)
What to do with qualitative data?

• Male female?
• Large, medium, small?
Issues with standard linear regression

• SLR works well with many real-world problems

• Two strict assumptions
• Additive: effect of changes in a predictor Xj on the response Y is independent
of the values of the other predictors
• Linear: states that the change in the response Y due to a one-unit change in X j
is constant, regardless of the value of Xj

This is not changed with the value of

X2
Correcting interaction impact

• Including an interaction term

• How does this relax the additive assumption?

What does this do?

Outliers
• The real value is far from the predicted value
• Does it have an impact on the model?

After removal of the

outlier
Multicollinearity

• Refers to the situation in which two or more predictor

variables are closely related to one another
• presence of collinearity can pose problems in the
regression context
• difficult to separate out the individual effects of
collinear variables on the response
• Ridge and Lasso Regression

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Entrepreneurial Process
100% (1)
Entrepreneurial Process
12 pages
Individual Plan For Professional Development (PPD)
75% (12)
Individual Plan For Professional Development (PPD)
23 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
IS4242 W3 Regression Analyses
No ratings yet
IS4242 W3 Regression Analyses
67 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
3 Linear Regression 1
No ratings yet
3 Linear Regression 1
5 pages
Regression
No ratings yet
Regression
24 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
ISLR
No ratings yet
ISLR
9 pages
3 Linear Regression 3
No ratings yet
3 Linear Regression 3
10 pages
Session 3: - Quantitative Demand
No ratings yet
Session 3: - Quantitative Demand
32 pages
Lecture 4
No ratings yet
Lecture 4
62 pages
ML 3 1
No ratings yet
ML 3 1
60 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
F Regression
No ratings yet
F Regression
65 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
3 Linear Regression 2
No ratings yet
3 Linear Regression 2
5 pages
03 - Simple Linear Regression
No ratings yet
03 - Simple Linear Regression
13 pages
QMM Epgdm 5
No ratings yet
QMM Epgdm 5
58 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
95 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
DECISION and REGRESSION
No ratings yet
DECISION and REGRESSION
52 pages
UNIT II Regression
No ratings yet
UNIT II Regression
59 pages
Simple Regression
No ratings yet
Simple Regression
35 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Chapter 2 Simple Linear Regression - Jan2023
No ratings yet
Chapter 2 Simple Linear Regression - Jan2023
66 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
Module5 Marketing Mix Model 1
No ratings yet
Module5 Marketing Mix Model 1
43 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Regression I
No ratings yet
Regression I
41 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
No ratings yet
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
39 pages
Deck2 BusinessIntelligence M1 ACSA
No ratings yet
Deck2 BusinessIntelligence M1 ACSA
15 pages
Regression
No ratings yet
Regression
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Edexcel Gcse Statistics Coursework Example
100% (2)
Edexcel Gcse Statistics Coursework Example
6 pages
Marketing Research: University Bookstore
No ratings yet
Marketing Research: University Bookstore
29 pages
Poverty and Its Measurement: The Presentation of A Range of Methods To Obtain Measures of Poverty
No ratings yet
Poverty and Its Measurement: The Presentation of A Range of Methods To Obtain Measures of Poverty
39 pages
Stats Reviewer 1.1
No ratings yet
Stats Reviewer 1.1
9 pages
Topics For PHD Thesis On Marketing Management
100% (3)
Topics For PHD Thesis On Marketing Management
7 pages
UGC NET PHYSICAL EDUCATIOn
No ratings yet
UGC NET PHYSICAL EDUCATIOn
85 pages
Mid Sem 1sem Exam Paper Oct2015
100% (1)
Mid Sem 1sem Exam Paper Oct2015
26 pages
MAPE
No ratings yet
MAPE
5 pages
External Environment 2. Natural Environment 3. Sources of Data and Techniques
No ratings yet
External Environment 2. Natural Environment 3. Sources of Data and Techniques
80 pages
Reflective Practice in Language Teaching - Thomas Farrell
100% (2)
Reflective Practice in Language Teaching - Thomas Farrell
64 pages
Performance Task Instruction Sheet
No ratings yet
Performance Task Instruction Sheet
2 pages
Palmgren Revisited - A Basis For Bearing Life Prediction
No ratings yet
Palmgren Revisited - A Basis For Bearing Life Prediction
12 pages
H2IOSC Deliverables 4.13 Draft
No ratings yet
H2IOSC Deliverables 4.13 Draft
11 pages
Concept Paper Template 1
No ratings yet
Concept Paper Template 1
2 pages
Project Report On Employee Motivation
No ratings yet
Project Report On Employee Motivation
51 pages
4th Periodical Exam Eng10 PDF Free
No ratings yet
4th Periodical Exam Eng10 PDF Free
3 pages
Chapter 2 Alumni Management System Draft
No ratings yet
Chapter 2 Alumni Management System Draft
12 pages
Government Schemes Education
No ratings yet
Government Schemes Education
50 pages
Print SMPN 28 The Effect of Sociodrama Method For Junior High School Students
No ratings yet
Print SMPN 28 The Effect of Sociodrama Method For Junior High School Students
30 pages
PPC-Operation Research
No ratings yet
PPC-Operation Research
4 pages
Statistical Thinking in Clinical Trials Michael A Proschan Download
No ratings yet
Statistical Thinking in Clinical Trials Michael A Proschan Download
82 pages
Hand Washing Sop
No ratings yet
Hand Washing Sop
8 pages
Beyond Scores A Modular RAG-Based System For Automatic Short Answer Scoring With Feedback
No ratings yet
Beyond Scores A Modular RAG-Based System For Automatic Short Answer Scoring With Feedback
15 pages
Testing Questionaire Email
No ratings yet
Testing Questionaire Email
8 pages
1 s2.0 S2772970223000263 Main
No ratings yet
1 s2.0 S2772970223000263 Main
15 pages
Anth 101 Course Outline
No ratings yet
Anth 101 Course Outline
4 pages
Dyes in History and Archaeology 41
No ratings yet
Dyes in History and Archaeology 41
348 pages
Applied
No ratings yet
Applied
4 pages

1.1 Regression Analysis

Uploaded by

1.1 Regression Analysis

Uploaded by

Regression Analysis

Dr. Chathura Rajapakse

• What do we intend to know?

• Measuring the closeness

• Can write as:

• Least square approach uses minimizing the RSS

Least square coefficient

• That gives estimates for that sample

Drawn from a normal

Unknown but can be

• There is an approximately 95% chance that the range;

• will contain its true value

• Two methods to investigate

• What will happen to RSE if the deviation is high (lack of fit)

• Total Sum of Squares (TSS)

• measures the proportion of variability in Y that can be explained using X

• In simple regression, r = Cor(X,Y) is a measure of association between X

• What if there is more than one predictor?

• For unknown coefficients, we can use

• Using the same least squares approach

• The coefficients that minimize the RSS

• How do you interpret the below values

What does this slope

🠶 The important questions when doing multiple regression analysis are:

• This hypothesis test can be performed with F-statistic

• Similar to the simple linear model:

So, if H0 is true, what

• If Ha is true what will happen to F value?

The coefficients are

The new residual sum

• How many combinations can we make out of p predictors?

• Can you evaluate the three approaches?

• Both RSE and R2 approaches can be used

• Assuming a linear model is only an approximation of reality – a biasness from

• SLR works well with many real-world problems

This is not changed with the value of

• Including an interaction term

• How does this relax the additive assumption?

What does this do?

After removal of the

• Refers to the situation in which two or more predictor

You might also like