0% found this document useful (0 votes)

229 views73 pages

Chapter 5

This document provides an overview of simple linear regression analysis. It defines key concepts such as the regression line, independent and dependent variables, and the regression coefficients. It explains how the regression line is the line of best fit identified using the method of least squares. It also describes how to evaluate the regression model using the F-test and R-squared value and how these indicate whether the independent variable is a significant predictor of the dependent variable. Examples of performing simple linear regression in SPSS are also provided.

Uploaded by

jayroldparcede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

229 views73 pages

Chapter 5

Uploaded by

jayroldparcede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 73

Chapter 5: Regression

'Regression' (latin) means 'retreat', 'going back to', 'stepping back'. In a 'regression' we try to (stepwise) retreat from our data and explain them with one or more explanatory predictor variables. We draw a 'regression line' that serves as the (linear) model of our observed data.

www.vias.org/.../img/gm_regression.jpg

Correlation vs. regression

Correlation

Regression

In a correlation, we look at the relationship between two variables without knowing the direction of causality

In a regression, we try to predict the outcome of one variable from one or more predictor variables. Thus, the direction of causality can be established.
1 predictor=simple regression >1 predictor=multiple regression

Correlation vs. regression

Correlation
For a correlation you do not need to know anything about the possible relation between the two variables Many variables correlate with each other for unknown reasons Correlation underlies regression but is descriptive only

Regression
For a regression you do want to find out about those relations between variables, in particular, whether one 'causes' the other. Therefore, an unambiguous causal template has to be established between the causer and the causee before the analysis! This template is inferential. Regression is THE statistical method underlying ALL inferential statistics (t-test, ANOVA, etc.). All that follows is a variation of regression.

Linear regression Independent and dependent variables

In a regression, the predictor variables are labelled 'independent' variables. They predict the outcome variable labelled 'dependent' variable. A regression in SPSS is always a linear regression, i.e., a straight line represents the data as a model.

http://snobear.colorado.edu/Markw/SnowHydro/ERAN/regression.jp

Method of least squares

In order to know which line to choose as the best model of a given data cloud, the method of least squares is used. We select the line for which the sum of all squared deviations (SS) of all data points is lowest. This line is labelled 'line of best fit', or 'regression line'.
Regression line

Simple regression Regression coefficients

In mathematics, a coefficient is a constant multiplicative factor of a certain object. For example, the coefficient in 9x2 is 9. http://en.wikipedia.org/wiki/Coefficient

The linear regression equation ( 5.2) is:

Yi = (b0 + b1Xi) + i
Yi = outcome we want to predict b0 = intercept of the regression line b1 = slope of the regression line regression coefficients

Xi = Score of subjecti on the predictor variable i = residual term, error

Slope/gradient and intercept

Slope/gradient: steepness of the line; neg or pos Intercept: where the line crosses the y-axis

Yi = (- 4 + 1.33Xi) + i

http://algebra-tutoring.com/slope-intercept-form-equation-lines-1-gifs/slope-52.gif

'goodness-of-fit'

The line of best fit (regression line) is compared with the most basic model. The former should be significantly better than the latter. The most basic model is the mean of the data.

Relation between tobacco and alcohol consume

http://images.google.de/imgres?imgurl=http://math.uprm.edu/~wrolke/esma3102/graphs/rssfig2.pn g&imgrefurl=http://math.uprm.edu/~wrolke/esma3102/rss.htm&h=552&w=553&sz=4&hl=de&start= 23&tbnid=eY0TWAtPXf0_ZM:&tbnh=133&tbnw=133&prev=/images%3Fq%3Dsum%2Bof%2Bsqua res%26start%3D21%26svnum%3D10%26hl%3Dde%26lr%3D%26sa%3DN

Mean of Y as basic model

Yi -Y The summed squared differences between observed values and the mean, SST, are big, hence the mean is not a good model of the data

Mean, Y

Sum of squares total: SST

Regression line as a model

The summed squared differences between observed values and the regression line, SSR, are smaller, hence this regression line is a much better model of the data

sum of squares residual SSR

Model Mean, Y

Sum of squares model, SSM

SSM: sum of squared differences between the mean of Y and the regresion line (as our model)

Comparing the basic model and the 2 regression model: R

The improvement by the regression model can be expressed by dividing the sum of squares of the regression model SSM by the sum of squares of the basic model SST:

R2 = SSM SST

The basic comparison in statistics is always to compare the amount of variance that our model can explain with the total amount of variation there is. If the model is good it can explain a significant proportion of this overall variance.

This is the same measure as the R2 in chapter 4 on correlation. Take the square root of R2 and you have the Pearson correlation coefficient r!

Comparing the basic model and the regression model: F-Test

In the F-Test, the ratio of the improvement due to the model SSM and the difference between the model and the observed data, SSR, is calculated. We take the mean sum of squares, or mean squares, MS, for the model, MSM, and the observed data, MSR: F = MSM MSR The F-ratio should be high (since the model should have improved the prediction considerably, as expressed in MSM). MSR, the difference between the model and the observed data (the residual), should be small.

The coefficient of a predictor

The coefficient of the predictor X is b1. B1 indicates the gradient/slope of the regression line. It says how much Y changes when X is changed one unit. In a good model, b1 should always be different from 0, since the slope is either positive or negative. Only a bad model, i.e., the basic model of the mean, has a slope of 0.
b10 b1=0

If b1=0, this means: A change in one unit of the predictor X does not change the predicted variable Y The gradient of the regression line is 0.

T-Test of the coefficient of the predictor

A good predictor variable should have a b1 that is different from 0 (the regression coefficient of the basic model, the mean). Whether this difference is significant, can be tested by a t-test. The b of the expected values (0-Hypothesis, i.e., 0) is subtracted from the b of the observed values and divided by the standard error of b.

t = bobserved bexpected
SEb t= bobserved 0.

Since bexpeted=0

t should be * different from

Simple regression on SPSS

(using the Record1.sav data) Descriptive glance: Scatterplot of the correlation between advertisement and record sales
Graphs --> Interactive --> Scatterplot

Linear Regression

Record Sales (thousands) = 134.14 + 0.10 * adverts R-Square = 0.33

Record Sales (thousands)

300,00

200,00

100,00

Under 'Fit', tick 'include constant' and 'fit line to total'

0,00 0,00 500,00 1000,00 1500,00 2000,00

Advertsing Budget (thousands of pounds)

Comparing the mean and the regression model

(using the Record1.sav data)

Graphs --> Interactive --> Scatterplot

Under 'Fit', tick 'mean'

Linear Regression

Record Sales (thousands) = 134.14 + 0.10 * adverts R-Square = 0.33

Record Sales (thousands)

300,00

200,00

100,00

0,00 0,00 500,00 1000,00 1500,00 2000,00

--> The regression line is quite different from the mean

Simple regression on SPSS

(using the Record1.sav data)

Analyze --> Regression --> Linear

Predictor: How much money (in 1000) you spend on advertisement

What you want to predict: # of records (in 1000) sold

Output of simple regression on SPSS

(using the Record1.sav data)
Analyze --> Regress --> Linear
R is the simple Pearson correlation between 'advertisement' and 'records sold'
Model Summary Adjusted R Square ,331 Std. Error of the Estimate 65,9914

R is the amount of explained variance

Model 1

R R Square a ,578 ,335

a. Predictors: (Constant), ADVERTS Advertsing Budget (thousands of pounds)

R2= 33% of the total variance can be explained by the predictor 'advertisement'.

66% of the variance cannot be explained.

ANOVA for the SSM (F-test): advertisement predicts sales significantly

SSM SSR MSM
F = MSM/MSR = 433687,833/4354,87 = 99,587

ANOVA Sum of Squares 433687,8 862264,2 1295952

Model 1

df 1 198 199

Regression Residual Total

Mean Square 433687,833 4354,870

F 99,587

Sig. ,000a

a. Predictors: (Constant), ADVERTS Advertsing Budget (thousands of pounds) b. Dependent Variable: SALES Record Sales (thousands)

MSR

SST sum of squares total

Regression b0 intercept where regrescoefficients b0, sion line crosses Y axis When no money is spent (X=0), 134,140 records are sold
Coefficientsa

b1 gradient If predictor X is increased by 1 unit (1000, then 96,12 extra records will be sold
t= B/SEB 134,14/7,537= 17,799

Model 1

(Constant) ADVERTS Advertsing Budget (thousands of 9,612E-02 pounds)

Unstandardized Coefficients B Std. Error 134,140 7,537 ,010

Standardi zed Coefficien ts Beta

t 17,799 9,979

Sig. ,000 ,000

,578

a. Dependent Variable: SALES Record Sales (thousands)

=.09612

A closer look at the t-values

Coefficientsa Standardi zed Coefficien ts Beta

Model 1

(Constant) ADVERTS Advertsing Budget (thousands of 9,612E-02 pounds)

Unstandardized Coefficients B Std. Error 134,140 7,537 ,010

t 17,799 9,979

Sig. ,000 ,000

,578

a. Dependent Variable: SALES Record Sales (thousands)

The equation for computing the t-value is

t= B/SEB

For the constant: 134,14/7,537=17,799 For ADVERTS: B=0.09612/.010 should result in 9.612, however, t= 9.979

Whatswrong?Nothing,thisisaroundingerror.Ifyoudouble-click on the output table Coefficients,amoreexactnumberwillbeshown: 9.612E-02 = 0,09612448597388 .010 = 0,00963236621523 If you re-compute the equation with these numbers, the result is correct: 0,09612448597388/ 0,00963236621523 = 9.979

Using the model for Prediction

Imagine the record company wants to spend 100,000 for advertisement. Using Equation 5.2, we can fit in the values of b0 and b1:

Yi = (b0 + b1Xi)
= 134.14 + (.09612 x Advertising Budgeti)
Expl: If 100,000 are spent on ads,
Is that a good deal?

134.14 + (.09612 x 100) = 143.75

144,000 records should be sold on the first week.
http://image.informatik.htwaalen.de/Thierauf/Knobelaufgaben/Sommer03/zweifel.png

Multiple regression
In a multiple regression, we predict the outcome of a dependent variable Y by a linear combination of >1 independent predictor variables Xi
Outcomei = (Modeli) + errori Every variable has its own coefficient: b1, b2,...,bn (5.9)

Yi = (b0 + b1X1 + b2X2 + ... + bnXn) + i

b1X1= 1st predictor variable with its coefficient b2X2 = 2nd predictor variable with its coefficient, etc. i = residual term

Multiple Regression on SPSS

using file record2.sav We want to predict record sales (Y) by two predictors: X1 = advertisement budget X2 = number of plays on Radio 1

Record Salesi = b0 + b1Adi + b2Playi + i

Instead of a regression line, a regression plane (2 dimensions) is now fitted to the data (3 dimensions)

3D-Scatterplot of the relation between record sale (Y) and advertisement budget (X1) No of plays on Radio 1/week (X2)

Graphs --> Interactive --> Scatterplot --> 3D

Multiple regression with 2 Variables can be visualized as a 3D-scatterplot. More variables cannot be accomodated visually.

Regression planes and confidence intervals of multiple regression

Under the menu 'Fit', specify the following options

3-D-scatterplot

If adjusted appropriately, you can see the regression plain and the confidence plains almost like lines

The regression plains are chosen as to cover most of the data points in the threedimensional data cloud

Sum of squares, R,

2 R

The terms we encountered for simple regression, SST, SSR, SSM, still mean the same, but are more complicated to compute now.

Instead of the simple correlational coefficient R, we use a multiple correlation coefficient Multiple R.
Multiple R is the correlation between the predicted and observed values of the outcome. As in simple R, Multiple R, should be great. Multiple R2 is a measure of the explained variance of Y by the predictor variables X1-Xn.

Methods of regression
The predictors of the model should be selected carefully, e.g., based on past research or theoretically well motivated. Hierarchical method (ordered entry): first, known predictors are entered, then new ones, either blockwise (all together) or stepwise Forced entry ('enter'): All predictors are forced into the model simultaneously Stepwise methods: Forward: Predictors are introduced one by one, according to their predictive power. Stepwise: Same as Forward + a removal test. Backward: Predictors are judged against a removal criterion and eliminated accordingly.

How to choose one's predictors

Based on the theoretical literature, choose predictors in their order of importance. Do not choose too many Run an initial multiple regression Eliminate useless predictors Take ca. n=15 subjects per predictor

Evaluating the model

1. The model must fit the data sample 2. The model should generalize beyond the sample

Evaluating the model - diagnostics

1. Fitting the observed data: - Check for outliers which bias the Analyze --> Regression model and enlarge the residual --> Linear - Look at standardized residuals (z-Under 'Save', specify: scores): If > 1% are lying outside the margins of +/- 2.58, the model is poor. - Look at studentized residuals: (unstandardized residuals/ SD that varies point by point.) Yields a more exact estimate of error variance. Note: SPSS adds the computed scores into new columns in the data file.

Evaluating the model - diagnostics - continued

Identify influential cases and see how the model changes if they are excluded.

This is done by running the regression without that particular case and then use the new model to predict the value of the just excluded case (its 'adjusted predicted value'). If the case is similar to all other cases, its 'adjusted predicted value' will not differ much from its predicted value, given the model including it.

Evaluating the model - continued

DFBeta:a measure of the influence of a case on the values of bi. DFFit:...differencebetweenthe adjusted predicted value and the original predicted value of a particularcase.(Field.)927 ,5002 Deleted residual: residual based on the adjusted predicted value. ...thedifferencebetweenthe adjusted predicted value for a case and the original observed valueforthatcase.(Field,5002 728)
A way of standardizing the deleted residual is to divide it by its SD --> studentized deleted residual.

Evaluating the model - continued

Identify influential cases and see how the model changes if they are excluded.

Cook's distance measures the influence of a case on the overall model's ability to predict all cases. Leverage estimatestheinfluenceofthe observed value of the outcome variable overthepredictedvalues.(Field 2005, 736)
Leverage values lie between 0<x>1 and may be used to define cut-off points for excluding influential cases.

Mahalanobis distances measure the distance of cases from the means of the predictor variables.

Example for using DFBeta as an indicator of an 'influential case'

using file dfbeta.sav Run a simple regression with all data (including outlier, case 30): Analyze --> Regression --> Linear
What you want to predict

Your predictor

Example for using DFBeta as an indicator of an 'influential case'

using file dfbeta.sav Case 30 removed All data (including (with Data --> Select outlier, case 30): cases --> use filter variable)
B0=29; b1= -.90 B0 = 31; b1=-1 Both regression coefficients b0 (constant/intercept) and b1 (gradient/slope) changed !

Coefficientsa Standardi zed Coefficien ts Beta -,950

Coefficientsa Standardi zed Coefficien ts Beta -1,000

Model 1

(Constant) X

Unstandardized Coefficients B Std. Error 29,000 ,992 -,903 ,056

tModel 29,237 1 -16,166

Sig. ,000 (Constant) X ,000

Unstandardized Coefficients B Std. Error 31,000 ,000 -1,000 ,000

t , ,

Sig. , ,

a. Dependent Variable: Y

Example for using DFBeta as an indicator of an 'influential case'

using file dfbeta.sav

Dfbeta of the constant (dfb0) and of the predictor x (dfb1) are much higher than those of the other cases

Summary of both calculations Scatterplots for both samples

Parameter (b) + case 30 Constant (b0) 29.00 Gradient (b1) -.90 Model Y=(-.9)X+29 Predicted Y 28.0100

- case 30 31.00 -1 Y=(-1)X+31

Difference -2.00 .10 30-1.09

With case 30:

Without case 30

Outlier

DFBetas, DFFit, CVR's

All the following measures measure the difference between a model including and one excluding influential cases: Standardized DFBeta: Difference between a parameter estimated using all cases and estimated when one case is excluded, e.g. DFBetas of the parameters b0 and b1. Standardized DFFit: Difference between the predicted value for a case in a model including vs. in a model excluding this value. Covariance ratio (CVR): measure of whether a case influences the variance of the regression parameters. This ratio should be close to 1.

Help-Window,Topic index 'Linear Regression' WindowSavenewvariables

I find it hard to remember what all those influence statistics mean...

Why don't you look themupintheHelp window?

http://image.informatik.htwaalen.de/Thierauf/Knobelaufgaben/Sommer03/zweifel.png

Residuals and influence statistics

(using the file pubs.sav) The correlation between no. of pubs in London districts outlier and deaths with and without the outlier. Note: The residual for the outlier fitted to the regression line including it is small. However, its influence statistics is huge.
Why? The outlier is the 'City of London' district, where a lot of pubs are but only few residents live. The ones who are drinking in those pubs are visitors, hence, the ratio of deaths of citizens given the overall consumation of alcohol is relatively low.

Scatterplot of both variables Graphs --> Interactive --> scatterplot

Case summary: 8 London districts

St. Res. Lever

1 2 3 4 5 6 7 8 Total

-1,34 -0,88 -0,42 0,04 0,5 0,96 1,42 -0,28 8

St. DFFIT St. DFB Interc St. DFB Pubs 0,04 -0,74 -0,74 0,37 0,03 -0,41 -0,41 0,18 0,02 -0,18 -0,17 0,07 0,02 0,02 0,02 -0,01 0,01 0,2 0,19 -0,06 0,01 0,4 0,38 -0,1 0 0,68 0,63 -0,12 0,86 -4,60E+008 92676016 -4,30E+008 8 8 8 8

The residual of the outlier #8 is small because it actually sits very close to the regression line

The influence statistics are huge!

Excluding the outlier

(pubs.sav) Ifyoucreateavariablenum_dist(numberofthe district) in the variables list of the pubs.sav file and simply allocate a number to each district (1-8), you can use this variable to exclude the problematic district #8. Data Select cases If condition is satisfied num_dist~=8

Excluding the outlier continued

(pubs.sav) Look at the scatterplot again now that district # 8 has been excluded:

Graphs Interactive Scatterplot

Now the 7 remaining districts all line up perfectly on the (idealized) regression line

Will our sample regression generalize to the population?

If we want to generalize our findings of one sample to the population, we have to check some assumptions: Variable types: predictor variables must be quantitative (interval) or categorical (binary); outcome variable must be quantitative, continuous and unbounded (whole range must be instantiated) Non-zero variance of predictors No perfect correlation between 2 predictors Predictors are uncorrelated to any 'third variable' which was not included in the regression All levels of the predictor variables should have same variance

Will our sample regression generalize to the population?

- continued
Independent errors: The residual terms of any two observations should be uncorrelated (DurbinWatson Test) Residuals should be normally distributed All of the values of the outcome variable are independent Predictors and outcome have a linear relation

If these assumptions are not met, we cannot draw valid conclusions from our model!

If our model is generalizable, it should be able to predict the outcome of a different sample. Adjusted R2: R2 indicates the loss of predictive power (shrinkage) if the model were applied to the population: adj R2 = 1n-1 n-2 n+1 n-k-1 n-k-2 n (1-R2)

Two methods for the crossvalidation of the model

R= unadjusted value n= number of cases k= number of predictors in the model

Data splitting: The entire sample is split into two. Regressions are computed and compared for both halves. Nice method but one rarely has so many data.

Sample size
The required sample size for a regression depends on The number of predictors k The size of the effect The size of the statistical power

e.g., large efffect --> n= 80 (for up to 20 predictors) medium effect --> n=200 small effect --> n=600

(Multi-)Collinearity
If 2 predictors are inter-correlated, we speak of collinearity. In the worst case, 2 variables have a correlation of 1. This is bad for a regression, since the regression cannot be computed reliably anymore. This is because the variables become interchangeable. High collinearity is rare, but some degree of collinearity is always around. Problems with collinearity:
It underestimates the variance of a second variable if this variable is strongly intercorrelated with the first variable. It adds little unique variance although taken for itself it would explain a lot. We can't decide which variable is important, which variable should be included The regression coefficients (b-values) become instable.

How to deal with collinearity

SPSS has some collinearity diagnostics:

Variance inflation factor Tolerance statistics ...

in the 'Statistics' window of the 'linear regression' menu

Multiple Regression on SPSS

(using the file Record2.sav) Example: Predicting the record sales from 3 predictors:

X1: Advertisement budget, X2: times played on radio, X3: attractiveness of the band

Since we know already that money for ads is a predictor, it will be entered into the regression first (1st block), and the 2 new predictors later (2nd block) --> hierarchical method ('Enter').
1st block Var 1 2nd block Var 2+3

WhattheStatisticsboxshouldlooklike Analyze --> Regression --> Linear

Regression diagnostics

The regression diagnostics are saved in the data file, each as a separate variable in a new column

leave them as they are

Options

Interpreting Multiple Regression

Descriptive Statistics Mean SALES Record Sales (thousands) ADVERTS Advertsing Budget (thousands of pounds) AIRPLAY No. of plays on Radio 1 per week ATTRACT Attractiveness of Band 193,2000 614,4123 27,5000 6,7700 Std. Deviation 80,6990 485,6552 12,2696 1,3953 N 200 200 200 200

The 'Descriptives' give you a brief summary of the variables

Interpreting Multiple Regression

Pearson correlations R

R of predictors 123 with outcome R of pred1 with the others R of pred2 with the other R of pred3 with the others Significance levels for all correlations

Correlations: R's between all variables and signiflevels. Pred 2 (plays on radio) is the best predictor.
Predictors should not correlate higher than R>.9 (collinearity)

Summary of model
Only advertisement as predic tor Correlation between predictor(s) and outcome Change from 0 to .335 (Model 1) and another change of .330 (Model 2) Model Summaryc Degrees of freedom; df1:p-1 df2:N-p-1 (N=sample size; p=# of predictors) If errors are independent. If value close to 2, then OK

Change Statistics Adjusted R Square ,331 ,660 Std. Error of the R Square Estimate Change 65,9914 ,335 47,0873 ,330 F Change 99,587 96,447 Sig. F Change ,000 ,000 DurbinWatson 1,950

Model 1 2

R R Square a ,578 ,335 ,815b ,665

df1 1 2

df2 198 196

a. Predictors: (Constant), ADVERTS Advertsing Budget (thousands of pounds) b. Predictors: (Constant), ADVERTS Advertsing Budget (thousands of pounds), ATTRACT Attractiveness of Band, AIRPLAY No. of plays on Radio 1 per week c. Dependent Variable: How well Sales (thousands) SALES Record

3 predic tors

Explained variance by the predictor(s)

the model generalizes. Similar values to R2 are good. Only 5% shrinkage

F-values for R2 change

The model(s) bring about a significant change

ANOVA for the model against the basic model (the mean)
SSM Df equal to # of cases Df equal to Df equal minus # of # of cases to # of coefficients minus 1 predic(b0,b1) 200-1=199 tors 200-2=198
ANOVAc Sum of Squares 433687,8 862264,2 1295952 861377,4 434574,6 1295952

F-values: MSM/MSR: 433687.833/4354.87=99.587 287125.806/2217.217=129.498

SSR

Model 1

df 1 198 199 3 196 199

SST

Regression Residual Total Regression Residual Total

Mean Square 433687,833 4354,870 287125,806 2217,217

F 99,587

Sig. ,000a

Significance level

129,498

,000b

Mean squares: SS/df 433687.8/1=433687.8 862264.2/198=4354.87

Both Model 1 and 2 have improved the prediction significantly, Model 2 (3 predictors) even better than Model 1 (1 predictor)

Record sales increase by .511 SD's when the predictor (ads) changes 1 SD; b1 and b2 have equal 'gains' Model 1= same as in first analysis

Model parameters the b-values With 95% confidence

Coefficientsa

lie within these boundaries Tight boundaries are good

Correlations Zeroorder Partial Part

Model 1

2b0

b1
b2 b3

(Constant) ADVERTS Advertsing Budget (thousands of 9,61E-02 pounds) (Constant) -26,613 ADVERTS Advertsing Budget (thousands of 8,49E-02 pounds) AIRPLAY No. of plays 3,367 on Radio 1 per week ATTRACT Attractiveness of Band 11,086

Unstandardized Coefficients Std. B Error 134,140 7,537 ,010 17,350 ,007 ,278 2,438

Standardized Coefficients Beta t 17,799 9,979 -1,534 ,511 ,512 ,192 12,261 12,123 4,548 Sig. ,000 ,000 ,127 ,000 ,000 ,000

95% Confidence Interval for B Lower Upper Bound Bound 119,28 149,002 ,077 -60,830 ,071 2,820 6,279 ,115 7,604 ,099 3,915 15,894

Collinearity Statistics Toler ance VIF

,578

1,000

1,0

,578 ,599 ,326

,659 ,655 ,309

,507 ,501 ,188

,986 ,959 ,963

1,0 1,0 1,0

a. Dependent Variable: SALES Record Sales (thousands)

The 'Coefficients' table tells us the individual contribution of variables to regression model. The Standardized tell us the importance of each predictor

Pearson Corr of predictor x outcome controlled for each single other predictor Pearson Corr of predictor x outcome controlled for all the other predictor Beta's 'unique relationship'

Excluded variables
Excluded Variablesb Collinearity Statistics Partial Minimum Correlation Tolerance VIF Tolerance ,665 ,344 ,990 ,993 1,010 1,007 ,990 ,993

Model Beta In t a 1 AIRPLAY No. of plays ,546 12,51 on Radio 1 per week a ATTRACT ,281 5,136 Attractiveness of Band

Sig. ,000 ,000

a. Predictors in the Model: (Constant), ADVERTS Advertsing Budget (thousands of pounds) b. Dependent Variable: SALES Record Sales (thousands)

What contribution would this predictor have made to a model containing it

SPSS gives a summary of those predictors that were not entered in the Model (here only for Model 1) and evaluates the contribution of the excluded variables.

Regression equation for Model 2 (including all 3 predictor variables)

Salesi

= b0+b1Advertisingi +b2airplayi +b3attractivenessi

= -26.61+(0.08Adi)+ (3.37Airplayi) + (11.09 Attracti) Interpretation: If Ad increaes 1 unit-->sales increase .08 units; if airplay + 1 unit-->sales+3.37; if attract + 1 unit --> sales +11 units, independent of the contributions of the other predictors.

No Multicollinearity

(In this regression, variables are not closely linearly related)

Collinearity Diagnosticsa

Model 1 2

Dimension 1 2 1 2 3 4

Eigenvalue 1,785 ,215 3,562 ,308 ,109 2,039E-02

Condition Index 1,000 2,883 1,000 3,401 5,704 13,219

(Constant) ,11 ,89 ,00 ,01 ,05 ,94

Variance Proportions ADVERTS Advertsing AIRPLAY Budget No. of plays (thousands on Radio 1 of pounds) per week ,11 ,89 ,02 ,96 ,02 ,00 ,01 ,05 ,93 ,00

ATTRACT Attractiveness of Band

,00 ,01 ,07 ,92

a. Dependent Variable: SALES Record Sales (thousands)

Each predictor's variance proportions load highly on a different dimension (Eigenvalue) --> they are not intercorrelated, hence no collinearity

Casewise diagnostics
Casewise Diagnosticsa

z-value
Case Number Std. Residual 1 2,125 >5% 2 -2,314 10 2,114 47 -2,442 52 2,069 55 -2,424 61 2,098 68 -2,345 100 2,066 164 >1% -2,577 169 >1% 3,061 200 >5% -2,064

SALES Record Sales (thousands) 330,00 120,00 300,00 40,00 190,00 190,00 300,00 70,00 250,00 120,00 360,00 110,00

Predicted Value 229,9203 228,9490 200,4662 154,9698 92,5973 304,1231 201,1897 180,4156 152,7133 241,3240 215,8675 207,2061

Residual 100,0797 -108,9490 99,5338 -114,9698 97,4027 -114,1231 98,8103 -110,4156 97,2867 -121,3240 144,1325 -97,2061

The casewise diagnostics lists cases that lie outside the boundaries of 2 SD (in the z-distribution, only 5% should be beyond 1.96 SD and only 1% beyond 2.58 Case 169 deviates most and needs to be followed up

a. Dependent Variable: SALES Record Sales (thousands)

FollowingupinfluentialcaseswithCasesummaries --> everything OK

No DFBETA's >1 (all OK Leverage values <.06 (all OK)

Cook distances <1 (all OK)

Mahalanobis' distances <15 (all OK)

Identify influencing cases by the case summary In the standardized residulas, no more than 5% must have values exceeding 2 and 1% exceeding 3. Cook's distances >1 might pose a problem Leverage (# of predictors + 1/sample size) must not be twice or three times higher Mahalanobis distance: cases with >25 in large samples (n=500) and >15 in small samples (n=100) can be problemantic Absolute values of DFBeta should not exceed 1 Determine upper and lower limit of covariance ratio (CVR). Upper limit = 1+3(average leverage); lower limit = 1-3(average leverage).

Checking assumptions: Heteroscedasticity

(Heteroscedasticity: residuals (errors) at each level of predictor have different variances). Here variances are equal

Plot of standardized residual *ZRESID/ standardized predicted value *ZPRED Points are randomly and evently dispersed --> assumptions of linearity and homoscedasticity are met

Checking assumptions Normality of residuals

The distribution of the residuals is normal (left hand picture), the observed probabilities correspond to the expected ones (right hand side)

Checking assumptions Normality of residuals - continued

Tests of Normality Kolmogorov-Smirnov Statistic df Sig. ZRE_1 Standardized Residual ,035 200 ,200*
a

*. This is a lower bound of the true significance. a. Lilliefors Significance Correction

The Kolmogoroff-Smirnov-Test for the standardized residuals is n.s. --> normal distribution

Boxplots, too, show the normality (note the 3 outliers!)

Checking assumptions Partial Regression Plots

Scatterplots of the residuals of the outcome variable and each of the predictors separately. No indication of outliers, evenly spaced out cloud of dots (only the residual variance of 'attractiveness of band' seems to be uneven.

National Structural Philippines: Code of
100% (2)
National Structural Philippines: Code of
1,008 pages
S1 S2 La Physics 2017 Edition
No ratings yet
S1 S2 La Physics 2017 Edition
310 pages
Slides - Simple Linear Regression
No ratings yet
Slides - Simple Linear Regression
35 pages
Handbook of Chemical and Biological Sensors
100% (2)
Handbook of Chemical and Biological Sensors
575 pages
Linear Regression
100% (2)
Linear Regression
28 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
F Regression
No ratings yet
F Regression
65 pages
13 Predictive Analysis - Tests of Association - Regression
No ratings yet
13 Predictive Analysis - Tests of Association - Regression
70 pages
Design s4 g8
No ratings yet
Design s4 g8
41 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
DA unit-III
No ratings yet
DA unit-III
30 pages
Lecture 12 Regression
No ratings yet
Lecture 12 Regression
55 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
ABRM Regression
No ratings yet
ABRM Regression
22 pages
Regression Analysis Presentation
No ratings yet
Regression Analysis Presentation
52 pages
Correlation Regression 15 16
No ratings yet
Correlation Regression 15 16
19 pages
Regressions Courses
No ratings yet
Regressions Courses
84 pages
Me302 Heat and Mass Transfer, May 2024
No ratings yet
Me302 Heat and Mass Transfer, May 2024
3 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
65 pages
ME5281 MTech Measurement-Friction
No ratings yet
ME5281 MTech Measurement-Friction
13 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Module 3
No ratings yet
Module 3
34 pages
Fda Unit 5
No ratings yet
Fda Unit 5
20 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
ch2 Linear Regression
No ratings yet
ch2 Linear Regression
39 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
6 ASAP Advanced Statistics-Regression
No ratings yet
6 ASAP Advanced Statistics-Regression
53 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
File4 Session3 Introduction To Regression
No ratings yet
File4 Session3 Introduction To Regression
50 pages
Intro To Reg Models
No ratings yet
Intro To Reg Models
27 pages
Unit III
No ratings yet
Unit III
13 pages
Iso 1940 - 1
100% (2)
Iso 1940 - 1
11 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
Deck2 BusinessIntelligence M1 ACSA
No ratings yet
Deck2 BusinessIntelligence M1 ACSA
15 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Chapter 2 Simple Linear Regression - Jan2023
No ratings yet
Chapter 2 Simple Linear Regression - Jan2023
66 pages
Fluidtherm - Fluidized Bed Furnaces PDF
No ratings yet
Fluidtherm - Fluidized Bed Furnaces PDF
6 pages
Lecture 16
No ratings yet
Lecture 16
29 pages
2024 KTG Final
No ratings yet
2024 KTG Final
12 pages
Regression
No ratings yet
Regression
14 pages
Module 7 EASA
90% (10)
Module 7 EASA
177 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
SAJAA (V29N5) p136-142 3055 FCA REFRESHER
No ratings yet
SAJAA (V29N5) p136-142 3055 FCA REFRESHER
7 pages
Week5Lecture2 PDF
No ratings yet
Week5Lecture2 PDF
17 pages
Complex Integration PDF
100% (2)
Complex Integration PDF
12 pages
Regression Lecture Summary
No ratings yet
Regression Lecture Summary
31 pages
Tsai LabReport Exp5
No ratings yet
Tsai LabReport Exp5
9 pages
Quantum Mechanics Project
No ratings yet
Quantum Mechanics Project
7 pages
Riemann Mapping Theorem PDF
No ratings yet
Riemann Mapping Theorem PDF
10 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Inverse and Analytic Functions PDF
No ratings yet
Inverse and Analytic Functions PDF
9 pages
Regression
No ratings yet
Regression
15 pages
Hanle - Erwin Schrödinger's Reaction To Louis de Broglie's Thesis On The Quantum Theory
No ratings yet
Hanle - Erwin Schrödinger's Reaction To Louis de Broglie's Thesis On The Quantum Theory
5 pages
Tech 06
No ratings yet
Tech 06
10 pages
Approximate Calculation of U-Shaped Bellows PDF
No ratings yet
Approximate Calculation of U-Shaped Bellows PDF
5 pages
1 s2.0 S2211601X11001076 Main
No ratings yet
1 s2.0 S2211601X11001076 Main
8 pages
Praise and Worship Line Up
No ratings yet
Praise and Worship Line Up
14 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
(UAM) Horizontal Dimension
No ratings yet
(UAM) Horizontal Dimension
5 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
QMM Epgdm 5
No ratings yet
QMM Epgdm 5
58 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
14 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Gr9. Nafs Science Form 3
No ratings yet
Gr9. Nafs Science Form 3
7 pages
Psych Stat Reviewer Midterms
No ratings yet
Psych Stat Reviewer Midterms
10 pages
E Cuaresma Presentation
No ratings yet
E Cuaresma Presentation
33 pages
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
No ratings yet
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
37 pages
Regression Analysis Using SPSS: DR Somesh K Sinha
100% (1)
Regression Analysis Using SPSS: DR Somesh K Sinha
17 pages
Physics 1st Year
No ratings yet
Physics 1st Year
3 pages
For Trends - Week2
No ratings yet
For Trends - Week2
25 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Smith, Madan, & Weinstein-In Press E-Xcellence in Teaching PDF
No ratings yet
Smith, Madan, & Weinstein-In Press E-Xcellence in Teaching PDF
17 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Updated Simulation Model of Active Front End Converter: Project Memo AN 01.12.97
100% (1)
Updated Simulation Model of Active Front End Converter: Project Memo AN 01.12.97
11 pages
Primal-Dual Interior Point Method Report
No ratings yet
Primal-Dual Interior Point Method Report
8 pages
STL203S
No ratings yet
STL203S
6 pages
Dispersion Compensation
No ratings yet
Dispersion Compensation
3 pages
For Trends - Week1
No ratings yet
For Trends - Week1
26 pages
KARMARKAR's Projective Scaling Method - Report
No ratings yet
KARMARKAR's Projective Scaling Method - Report
6 pages
Estimation of Fracture Toughness Using Ball Indentation: Literature Review
No ratings yet
Estimation of Fracture Toughness Using Ball Indentation: Literature Review
6 pages
AO4423/AO4423L: Product Summary General Description
No ratings yet
AO4423/AO4423L: Product Summary General Description
5 pages
A, The Two Spheres Are Separated. How Will The Spheres Be Charged, If at All?
No ratings yet
A, The Two Spheres Are Separated. How Will The Spheres Be Charged, If at All?
5 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Physics - Transfer of Thermal Energy
No ratings yet
Physics - Transfer of Thermal Energy
8 pages
CAT CP 1 (T) Calibration of RTD
100% (2)
CAT CP 1 (T) Calibration of RTD
4 pages
Table of Specifications
No ratings yet
Table of Specifications
2 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Es19 Syllabus 20102011
No ratings yet
Es19 Syllabus 20102011
5 pages
IV Recommended Program of Study 2011
No ratings yet
IV Recommended Program of Study 2011
3 pages
Finetek SP200
0% (1)
Finetek SP200
2 pages
Catalogue of Courses - Statisticsexisting
No ratings yet
Catalogue of Courses - Statisticsexisting
4 pages
Airborne Radar
No ratings yet
Airborne Radar
4 pages
Values Integration Objectives
No ratings yet
Values Integration Objectives
3 pages
Regression Analysis in R
No ratings yet
Regression Analysis in R
7 pages
Regression
No ratings yet
Regression
3 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)