0% found this document useful (0 votes)

807 views22 pages

Introduction To Linear Regression Analysis

This document provides an introduction to linear regression analysis using SAS. It discusses identifying the dependent and independent variables, estimating the regression model, and using the model to make predictions. Key steps covered include checking assumptions like linearity, normality, and independence of errors. Diagnostic tools like residuals, influence statistics, and multicollinearity measures are also summarized. The goal is to fit a linear regression model to predict jet engine thrust based on various operational variables.

Uploaded by

Nikhil Gandhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

807 views22 pages

Introduction To Linear Regression Analysis

Uploaded by

Nikhil Gandhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Introduction to linear regression analysis

-with applications on SAS

COMP-STAT GROUP 1
COMP-STAT GROUP
The aim of this presentation to explain important steps
involved in a Linear regression setup.
We will proceed in a logical flow of the process.
Identification ,estimation and prediction
2
Introduction
The study of dependence
Does changing the class size affect success of students
Explaining the dependent variable based on a set of
independent variables mathematically
COMP-STAT GROUP 3
COMP-STAT GROUP
Regression Models
4
The Model
COMP-STAT GROUP
Y is dependent variable
Xs are independent variables
is the error term
Observe that the model is linear in the coefficients .
What does linearity means?
Simple linear regression : Model with only one predictor
Estimation: Least square and/or maximum likelihood estimator
5
Assumptions
Linearity
Normality
Homoscedasticity
Independence
(of explanatory variables, of error terms)
Number of cases
Data accuracy
Missing Data
Outliers
What do they mean?
COMP-STAT GROUP
Main
assumptions
6
Assumptions (contd.)
Number of cases
The cases to independent variable ration should ideally be 20:1(min 5:1)
Accuracy of data
that you had entered valid data points
Missing data
there treatment is necessary
Outliers
7 COMP-STAT GROUP
Objectives of analysis
Estimation
Hypothesis testing
Confidence intervals
Prediction of new observations
Let us take a real life problem and then
proceed further
COMP-STAT GROUP 8
An example
COMP-STAT GROUP 9
We have data on jet engine thrust as response variable & primary
speed of rotation, secondary speed of rotation, fuel flow rate,
pressure, exhaust temperature and ambient temperature at time of
test as regressor variables
The objective is to fit a linear regression model and check if our
model satisfies all underlying assumptions and can predict future
observations correctly
Variable selection
Important algorithms:
Forward selection
Backward elimination
Stepwise regression (preferred)
Always start with your domain knowledge. It will guide you through
the selection of variables from a set of candidate variables.
Dont rely too much on variable selection algorithm since they are too
much computer dependant.
COMP-STAT GROUP 10
Categorical independent variables
How to incorporate qualitative variables in the analysis
Concept of dummy variables
We include k-1 dummies for a k categories
One category is set as base category
They act like usual variables in the linear regression setup
Suppose we have three categories of TV A,B and C .Then we will
include 2 dummies .
let the dummies are X and Y then they will take value as follows
X Y
A 0 0
B 1 0
C 0 1
COMP-STAT GROUP 11
Post estimation concerns
We had seen the model outputs and analyzed them
Once the model is estimated next step is to check
if our model satisfies all the assumptions stated
If all the assumptions are satisfied we are good otherwise
correction and modifications must be done to make the
model ready for use
COMP-STAT GROUP 12
Regression Diagnostics
13 COMP-STAT GROUP
e
i
=y
i
-
^
y
i
Lower the residuals better the model.
Types - Standardized residuals (Std.R)
- Studentized residuals (Stdnt.R)
- PRESS residuals
- Rstudent residuals
Std.R >3 ,indicates a potential outlier.
better to look for Stdnt.R
PRESS (prediction error sum of squares) Residuals
Also called deleted residuals
Estimate model by deleting that observation and then calculating the
predicted value for that observation. The residual so obtained is PRESS
residual
Higher value indicates a high influence point
SAS code
Proc reg data=test;
model y=x1 x2 x3 x4;
output out=dataset STUDENT RSTUDENT PRESS
COMP-STAT GROUP
Residuals
14
Residual plots
Normal probability plots
Plot of normal quantiles against residual
quantiles
a straight line confirms normality
assumption of residuals.
Highly sensitive to non normality
near two tails
Can be helpful in outlier detection
Statistical Tests
Kolmogorov Smirnov test
Anderson Darling test
Shaipro-Wilk test
COMP-STAT GROUP
SAS code
proc univariate data=residuals normal; /*normal option for normality tests*/
var r;
qqplot r/normal(mu= est sigma=est);
/*est is for estimating mean & variance from data itself*/
run;
15
Residual Plots (contd.)
Homogeneity of error variance
To check homoscedasticity assumption of the
error variance
If the assumption holds then the plot between
residuals and predicted values should have a
random pattern
Also reveal one or more unusually large residuals
which or course are potential outliers
If the plot is not random you may need to apply
some transformations on regressors
White Test
Tests the null hypothesis that the variance of the
residual is homogenous
Use the spec option in the model statement
Remedy
Resort to generalized least square estimators
SAS Code
Proc reg data=dataset;
model y=x1 x2 x3/spec
plot r.*p; /*plot residual vs. predicted values*/`
16 COMP-STAT GROUP
Outlier Treatment
Is an extreme observation
Residuals considerably larger in absolute value than the others say 3 or 4 standard
deviations from the mean indicate potential y-space outliers
Are data points that are not typical of the rest of the data
Residual plots and normal probability plot are helpful in identifying outliers
Can also use studentized or R-Student residuals
Should be removed from the data before estimating the model if it is a bad (?) value
There should be strong non statistical evidence that the outlier is a bad value before
it is discarded
Sometimes desired in the analysis ( you want points of high yield or say low cost)
17 COMP-STAT GROUP
Diagnostics for Leverage and influence
Leverage
o An observation with an extreme value on a
predictor variable is called a point with high
leverage
o Leverage is a measure of how far an independent
variable deviates from its mean
o These leverage points can have an effect on the
estimate of regression coefficients
o Leverage (>(2p+2)/n)
Influential Observations
o An observation is said to be influential if removing
the observation substantially changes the estimate
of coefficients
o Influence can be thought of as the product of
leverage and outliers
o Not all leverage points are going to be influential on
the regression coefficients
o desirable to consider both the location of the point
in the x-space and the response variable in
measuring the influence
o Measures :
Cooks D (>1), DFFITS(2p/n), DFBETAS(>2/n)
SAS Code
use
COOKD=name1
DFFITS=name2
H=name3 /* H is for leverage*/
in the output option of proc reg
(you can also use INFLUENCE in
model option for detailed analysis)
18 COMP-STAT GROUP
Multicollinearity
When explanatory variables are not independent (near perfect
linear relationship)
Reasons
Faulty data collection method
Constraints on the model or in the population
Model specification
An over defined model
Effect:
Unstable coefficients estimate
Inflated standard error of coff. Estimates
Tools to detect
Examine correlation matrix of independent variables\
Variance inflation factor (>10)(VIF) tolerance is 1/VIF
condition indices (>1000)
Variance decomposition proportions
COMP-STAT GROUP 19
SAS code
Proc reg data=test;
model y=x1 x2/VIF TOL COLLINOINT;
/*COLLINOINT gives a detailed collinearity analysis with intercept
variable adjusted out. COLLIN option gives the same analysis with
intercept*/
COMP-STAT GROUP
Remedies
Collecting additional data
Model respecification
Redefining the regressors
Variable elimination
20
Linearity
Scatter plot or matrix plot
Plots variables against each other
The linear relationship can be confirmed by observing a staright line trend
SAS Code
Proc sgscatter data=test;
Matrix x1 x2 x3 x4 / group=name;
Run;
COMP-STAT GROUP 21
Independence of error terms
We assume that error terms are independent of each
other
Can arise when observations are collected over time
the problem of autocorrelation
Durbin Watson test (~ 2 when error terms are uncorrelated)
Use dw in the model option in proc reg to calculate durbin watson test
Students of same school tend to be more alike than the
other schools
COMP-STAT GROUP 22

Geometric - Numerical - Integration Structure-Preserving Algorithms
No ratings yet
Geometric - Numerical - Integration Structure-Preserving Algorithms
659 pages
Bayes Theorem Exercise Problems and Solutions
No ratings yet
Bayes Theorem Exercise Problems and Solutions
6 pages
Chapter 14 Complex Integration
No ratings yet
Chapter 14 Complex Integration
26 pages
3 Scilab-Practical
No ratings yet
3 Scilab-Practical
41 pages
Pragya Scilab
100% (1)
Pragya Scilab
29 pages
Soil Classification Practice Question
No ratings yet
Soil Classification Practice Question
4 pages
EM 1202 Finals July222023 Answer Key
No ratings yet
EM 1202 Finals July222023 Answer Key
5 pages
Geostatistics Project 2 (PETE 630)
100% (1)
Geostatistics Project 2 (PETE 630)
28 pages
ES 209 Engineering Data Analysis - Long Quiz
No ratings yet
ES 209 Engineering Data Analysis - Long Quiz
3 pages
Chapter 06-Regression Analysis
No ratings yet
Chapter 06-Regression Analysis
41 pages
VOLTERRA INTEGRAL EQUATIONS .Ru
No ratings yet
VOLTERRA INTEGRAL EQUATIONS .Ru
15 pages
FDM 01
100% (1)
FDM 01
5 pages
Dimensional Analysis Lecture (Cambridge)
No ratings yet
Dimensional Analysis Lecture (Cambridge)
13 pages
Quadratic Surfaces
No ratings yet
Quadratic Surfaces
14 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
Mathematical Statistics I Muzammil Tanveer
No ratings yet
Mathematical Statistics I Muzammil Tanveer
64 pages
The Einstein Convention
No ratings yet
The Einstein Convention
4 pages
Isometrics Projections
No ratings yet
Isometrics Projections
23 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Ravi Project Report
No ratings yet
Ravi Project Report
37 pages
Solid Mensuration REVIEWER
No ratings yet
Solid Mensuration REVIEWER
5 pages
Classnote Ma2020
No ratings yet
Classnote Ma2020
136 pages
ODE Simulink PDF
No ratings yet
ODE Simulink PDF
92 pages
Equilibrium of A Particle
0% (1)
Equilibrium of A Particle
7 pages
TUTORIAL: Multiple Integrals: Dxdy X e y
No ratings yet
TUTORIAL: Multiple Integrals: Dxdy X e y
6 pages
EEE 201 Engineering Mathematics Assoc - Prof. Dr. Ertuğrul AKSOY Cartesian and Cylindrical Coordinate Systems
No ratings yet
EEE 201 Engineering Mathematics Assoc - Prof. Dr. Ertuğrul AKSOY Cartesian and Cylindrical Coordinate Systems
56 pages
Mean and Variance
No ratings yet
Mean and Variance
19 pages
4 Bisection Method
No ratings yet
4 Bisection Method
30 pages
Gec220 Questions Functions of Several Variables
0% (1)
Gec220 Questions Functions of Several Variables
6 pages
Chap4 Irrotational ZSJ
No ratings yet
Chap4 Irrotational ZSJ
40 pages
Ee206 4 PDF
No ratings yet
Ee206 4 PDF
70 pages
L20 Maxima Minima Problem
No ratings yet
L20 Maxima Minima Problem
20 pages
Math Diff PDF
No ratings yet
Math Diff PDF
315 pages
Solution Manual Stats
67% (3)
Solution Manual Stats
589 pages
Final Review Sol
No ratings yet
Final Review Sol
24 pages
Math EE IB
No ratings yet
Math EE IB
13 pages
Hsu Chapter 5 Fourier Series Halaman 1 10
No ratings yet
Hsu Chapter 5 Fourier Series Halaman 1 10
10 pages
Australia's Sporting Success
No ratings yet
Australia's Sporting Success
9 pages
EM 7 - EDA - Problem Set 1
0% (1)
EM 7 - EDA - Problem Set 1
2 pages
EX03. 2D Static Analysis of Rectangular Plate With Hole
No ratings yet
EX03. 2D Static Analysis of Rectangular Plate With Hole
2 pages
Center Manifold Reduction
100% (2)
Center Manifold Reduction
8 pages
Spline Methods Draft: Tom Lyche and Knut Mørken
No ratings yet
Spline Methods Draft: Tom Lyche and Knut Mørken
235 pages
Three-Dimensional Coordinate Systems
No ratings yet
Three-Dimensional Coordinate Systems
5 pages
NCL Functions and Procedures Reference Cards
No ratings yet
NCL Functions and Procedures Reference Cards
13 pages
C 1 Quadratic Interpolation
100% (1)
C 1 Quadratic Interpolation
18 pages
Matrices and Determinant
No ratings yet
Matrices and Determinant
13 pages
Tutorial Week 4 Solutions
No ratings yet
Tutorial Week 4 Solutions
4 pages
Dimensional Analysis: A Simple Example
No ratings yet
Dimensional Analysis: A Simple Example
10 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
18 Vector Word Problem Review PDF
No ratings yet
18 Vector Word Problem Review PDF
6 pages
Sol Man 0307
No ratings yet
Sol Man 0307
8 pages
Introduction To MATLAB For Engineers, Third Edition: An Overview of MATLAB
No ratings yet
Introduction To MATLAB For Engineers, Third Edition: An Overview of MATLAB
47 pages
Myriagon: Answer: 2015 - Sides
No ratings yet
Myriagon: Answer: 2015 - Sides
6 pages
01 Kros Ch01
0% (1)
01 Kros Ch01
28 pages
More Than One Answer Is Correct
No ratings yet
More Than One Answer Is Correct
182 pages
Numerical Methods - An Introduction
No ratings yet
Numerical Methods - An Introduction
30 pages
Finance Solved Cases
80% (5)
Finance Solved Cases
79 pages
Summary
No ratings yet
Summary
3 pages
Applications of Definite Integrals PDF
No ratings yet
Applications of Definite Integrals PDF
3 pages
Enterprise Resource Planning Implementation Differences Within The Same Methodology - Case Study From West Europe and Turkey
No ratings yet
Enterprise Resource Planning Implementation Differences Within The Same Methodology - Case Study From West Europe and Turkey
7 pages
Ypothesis Esting: Compstat Group
No ratings yet
Ypothesis Esting: Compstat Group
53 pages
So, What Next? When The Cat Is Idle, The Mice Will Play ...
No ratings yet
So, What Next? When The Cat Is Idle, The Mice Will Play ...
28 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Homework 6 Solutions
No ratings yet
Homework 6 Solutions
11 pages
An Alternative Corporate Governance System: Aswath Damodaran! 47!
No ratings yet
An Alternative Corporate Governance System: Aswath Damodaran! 47!
23 pages
InTech-From Soybean Phytosterols To Steroid Hormones
No ratings yet
InTech-From Soybean Phytosterols To Steroid Hormones
23 pages
Functions of Random Variables
No ratings yet
Functions of Random Variables
5 pages
The Cost of Capital: Outline
No ratings yet
The Cost of Capital: Outline
10 pages
What Is Free Cash Flow and How Do I Calculate It?: A Summary Provided by Pamela Peterson Drake, James Madison University
No ratings yet
What Is Free Cash Flow and How Do I Calculate It?: A Summary Provided by Pamela Peterson Drake, James Madison University
8 pages
Spreadsheet Modelling For Solving Combinatorial Problems: The Vendor Selection Problem
No ratings yet
Spreadsheet Modelling For Solving Combinatorial Problems: The Vendor Selection Problem
13 pages
Spreadsheet Modelling For Solving Combinatorial Problems: The Vendor Selection Problem
No ratings yet
Spreadsheet Modelling For Solving Combinatorial Problems: The Vendor Selection Problem
13 pages
C BITWF 73 Sample Items
No ratings yet
C BITWF 73 Sample Items
4 pages
A Survey of Minimal Surfaces
From Everand
A Survey of Minimal Surfaces
Robert Osserman
3.5/5 (1)
Exercises of Partial Differential Equations
From Everand
Exercises of Partial Differential Equations
Simone Malacrida
No ratings yet

Introduction To Linear Regression Analysis

Uploaded by

Introduction To Linear Regression Analysis

Uploaded by

Introduction to linear regression analysis

-with applications on SAS

You might also like