0% found this document useful (0 votes)
19 views

Lecture 4 Regression Analysis

This document discusses various statistical methods for testing relationships and associations, including Pearson R, Spearman Rho, Chi-square, and regression analysis. It outlines the assumptions and applications of these methods, as well as the use of software tools like Jamovi and E-views for parameter estimation. The document also provides an overview of regression analysis techniques and the steps involved in modeling and estimating relationships between variables.

Uploaded by

Gemar Perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lecture 4 Regression Analysis

This document discusses various statistical methods for testing relationships and associations, including Pearson R, Spearman Rho, Chi-square, and regression analysis. It outlines the assumptions and applications of these methods, as well as the use of software tools like Jamovi and E-views for parameter estimation. The document also provides an overview of regression analysis techniques and the steps involved in modeling and estimating relationships between variables.

Uploaded by

Gemar Perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

LECTURE 4

TEST OF RELATION AND ASSOCIATION


BA 502: Statistics with Computer Application
Learning Objectives
At the end of the discussion, learners will be able to:
1. Explain the test of relationship and association
2. Analyze cases where the Pearson R, Spearman Rho, Chi-square and
regression analysis will be apply
3. Utilize both Jamovi and E-views in estimating the parameters
INFERENTIAL STATATISTICS
It is a statistical method that deduces from a small but
representative sample the characteristics of a bigger
population. In other words, it allows the researcher to make
assumptions about a wider group, using a smaller portion of that group
as a guideline.
SOME OF INFERENTIAL STATATISTICS TOOLS (UNIVARIATE)

Simple
Pearson R.
Regression
Correlation
Analysis
Relationsh
ip Spearman
Rho Chi- Square
Correlation

Leading Innovations, Transforming Lives, Building the Nation


PEARSON PRODUCT MOMENT CORRELATION

The Pearson product-moment correlation coefficient (or Pearson correlation


coefficient, for short) is a measure of the strength of a linear association between two
variables and is denoted by r. Basically, a Pearson product-moment correlation attempts to
draw a line of best fit through the data of two variables, and the Pearson correlation
coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how
well the data points fit this new model/line of best fit).
The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value of
0 indicates that there is no association between the two variables. A value greater than 0
indicates a positive association; that is, as the value of one variable increases, so does the
value of the other variable. A value less than 0 indicates a negative association; that is, as the
value of one variable increases, the value of the other variable decreases. This is shown in the
diagram below:

Leading Innovations, Transforming Lives, Building the Nation


PEARSON PRODUCT MOMENT CORRELATION
The stronger the association of the two variables, the closer the Pearson correlation
coefficient, r, will be to either +1 or -1 depending on whether the relationship is positive or
negative, respectively. Achieving a value of +1 or -1 means that all your data points are included
on the line of best fit – there are no data points that show any variation away from this line.
Values for r between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there is variation
around the line of best fit. The closer the value of r to 0 the greater the variation around the line
of best fit. Different relationships and their correlation coefficients are shown in the diagram
below:

Leading Innovations, Transforming Lives, Building the Nation


PEARSON PRODUCT MOMENT CORRELATION
Assumptions

1. Your two variables should be measured on a continuous scale


(i.e., they are measured at the interval or ratio level).
2. Your two continuous variables should be paired, which means
that each case (e.g., each participant) has two values: one for
each variable. These "values" are also referred to as "data points".
3. There should be independence of cases, which means that the
two observations for one case
4. There should be a linear relationship between your two
continuous variables.
5. There should be homoscedasticity, which means that the
variances along the line of best fit remain similar as you move
along the line.
6. There should be no univariate or multivariate outliers.

Leading Innovations, Transforming Lives, Building the Nation


PEARSON PRODUCT MOMENT CORRELATION
Assumptions

Leading Innovations, Transforming Lives, Building the Nation


SPEARMAN'S RANK-ORDER CORRELATION
The Spearman's rank-order correlation is the nonparametric version of the
Pearson product-moment correlation. Spearman's correlation coefficient, (ρ, also signified
by rs) measures the strength and direction of association between two ranked variables.
You need two variables that are either ordinal, interval or ratio (see our Types of Variable
guide if you need clarification). Although you would normally hope to use a Pearson product-
moment correlation on interval or ratio data, the Spearman correlation can be used when the
assumptions of the Pearson correlation are markedly violated. However, Spearman's
correlation determines the strength and direction of the monotonic relationship between
your two variables rather than the strength and direction of the linear relationship between
your two variables, which is what Pearson's correlation determines.

Leading Innovations, Transforming Lives, Building the Nation


PEARSON PRODUCT MOMENT CORRELATION

Leading Innovations, Transforming Lives, Building the Nation


PEARSON PRODUCT MOMENT CORRELATION

Leading Innovations, Transforming Lives, Building the Nation


CHI-SQUARE TEST FOR ASSOCIATION

The Chi-Square Test for Association is used to determine if there


is any association between two variables. It is really a hypothesis test
of independence. The null hypothesis is that the two variables are
not associated, i.e., independent. The alternate hypothesis is that the
two variables are associated.
Assumptions
1. Your two variables should be measured at an ordinal or nominal
level (i.e., categorical data).
2. Your two variable should consist of two or more
categorical, independent groups

Leading Innovations, Transforming Lives, Building the Nation


Regression Analysis
Regression analysis is a statistical method used to
examine the relationship between one dependent variable
and one or more independent variables. The goal of
regression analysis is to understand the nature of the
relationship between the variables, make predictions, and
assess the strength and significance of the relationship. It
is widely used in various fields, including economics, finance,
biology, psychology, and many others.
Methods of Regression
Analysis
1. Ordinary Least Squares (OLS). OLS is the most widely used method for
linear regression. It minimizes the sum of squared differences between the
observed and predicted values.
2. Weighted Least Squares (WLS). WLS is an extension of OLS that assigns
different weights to different observations. This is useful when there is
heteroscedasticity, meaning that the variance of the errors is not constant
across all levels of the independent variable.
3. Ridge Regression. Ridge regression introduces a regularization term to the
OLS objective function. It is used to prevent overfitting by penalizing large
coefficients. This is especially useful when dealing with multicollinearity.
Methods of Regression
4. Lasso
Analysis
Regression. Lasso regression, like ridge regression, adds a
regularization term, but in this case, it includes an absolute value penalty.
Lasso can lead to some coefficients being exactly zero, effectively
performing variable selection.
5. Stepwise Regression. Stepwise regression is an automated variable
selection method that systematically adds or removes variables from the
model based on statistical criteria. It can be forward, backward, or a
combination of both.
6. Quantile Regression. Quantile regression estimates the relationship
between variables at different quantiles of the conditional distribution of the
dependent variable. It is useful when the relationship varies across different
parts of the distribution.
Methods of Regression
7.
Analysis
Bayesian Regression. Bayesian regression incorporates Bayesian
principles into the modeling process. It provides a posterior distribution for
the parameters, allowing for the incorporation of prior knowledge.
8. Nonlinear Regression. Nonlinear regression is used when the
relationship between the dependent and independent variables is not
linear. The model is specified as a nonlinear function of the parameters.
9. Logistic Regression. Logistic regression is used for binary or multinomial
classification problems. It models the probability of an event occurring
based on one or more predictor variables.
10. Poisson Regression. Poisson regression is used for count data,
assuming that the response variable follows a Poisson distribution.
Regression Modeling Steps
• Define problem or question
• Specify model
• Collect data
• Do descriptive data analysis
• Estimate unknown parameters
• Evaluate model
• Use model for prediction
METHOD OF LEAST SQUARES
• The straight line that best fits
the data. 2000

1800
• Determine the straight line for 1600

which the differences between 1400


f(x) = 12.162044198895 x + 699.957182320442

the actual values (Y) and the 1200

values that would be predicted

Sales
1000

from the fitted line of 800

600
regression (Y-hat) are as small 400

as possible. 200

0
0 10 20 30 40 50 60 70 80

Radio and TV Ads


GOAL
Develop a statistical model that can predict the values of a dependent (response)
variable based upon the values of the independent (explanatory) variables.

SIMPLE REGRESSION MULTIPLE REGRESSION


•  represents the unit change in Y per unit • i represents the unit change in Y per unit change
change in X in Xi.
• Does not take into account any other • Takes into account the effect of other i s.
variable besides single independent • “Net regression coefficient.”
variable. • A statistical model that utilizes two or more
• A statistical model that utilizes one quantitative and qualitative explanatory variables
quantitative independent variable “X” to (x1,..., xp) to predict a quantitative dependent
predict the quantitative dependent variable Y.
variable “Y.” Caution: have at least two or more quantitative
explanatory variables (rule of thumb)
SIMPLE LINEAR REGRESSION

Simple linear regression is a statistical method that allows us to


summarize and study relationships between two continuous
(quantitative) variables:
• One variable, denoted x, is regarded as
the predictor, explanatory, or independent variable.
• The other variable, denoted y, is regarded as
the response, outcome, or dependent variable.
Simple linear regression gets its adjective "simple," because it
concerns the study of only one independent variable relates to one
dependent variable.
Leading Innovations, Transforming Lives, Building the Nation
Assumptions
• Linearity. The relationship between the dependent variable and the
independent variables is assumed to be linear. The model assumes
that changes in the independent variables lead to proportional
changes in the dependent variable.
• No autocorrelation. The errors are assumed to be independent of
each other, meaning that there should be no systematic pattern in the
residuals over time or across observations.
• Homoscedasticity. The variance of the errors should be constant
across all levels of the independent variables. In other words, the
spread of the residuals should be roughly constant as you move
along the range of predicted values.

Leading Innovations, Transforming Lives, Building the Nation


Assumptions

• Normality of Errors. The errors are assumed to be normally


distributed. While this assumption is not crucial for large sample sizes
due to the Central Limit Theorem, it is important for small sample
sizes to ensure the validity of statistical inference, such as hypothesis
testing and confidence intervals.
• No Perfect Multicollinearity. The independent variables should not
be perfectly correlated. Perfect multicollinearity occurs when one
independent variable can be exactly predicted from another, leading
to difficulties in estimating the individual coefficients.

Leading Innovations, Transforming Lives, Building the Nation


Test of Assumptions

Assumptions Test
Linearity Scatter Diagram
No Autocorrelation Durbin-Watson
Normality of Errors Jarque-Bera
No Heteroskedasticity White's Test or Breusch-Pagan Test
No Perfect Multicollinearity Variance Inflation Factor (VIF)
Multicollinearity is applicable only in Multiple Regression

Leading Innovations, Transforming Lives, Building the Nation


Basic Regression Analysis
• EViews has a very powerful and easy-to-use estimation toolkit
that allows you to estimate from the simplest to the most complex
regression analysis.
• This tutorial explains basic regression techniques in EViews for
single equation regressions using cross-section data.
• The main topics include:
 Specifying and estimating an equation
 Equation Objects (saving, labeling, freezing, printing)
 Equation Output: Analyzing and Interpreting results
 Multiple Regression Analysis
 Estimation with Data Expressions and Functions
 Post Estimation: Working with Equations
 Hypothesis testing
 Estimation Options (robust standard errors, weighted least squares) 23
Description of Data File
• Data_wf1 has the following data on 9,275 individuals*

Wealth – net total financial wealth (in thousands of dollars)


Income – annual income (in thousands of dollars)
Male – dummy variable, equal to 1 if male, 0 otherwise
Married – dummy variable, equal to 1 if married, 0 otherwise
Age – age in years (minimum age in the dataset is 25 years).
Fsize – family size; number of individuals living in the family.

* This data is from Wooldridge, Introductory Econometrics (4th Edition).

24
The Equation Object
• Single equation regression estimation in EViews is performed using the Equation
Object.
There are a number of ways to create a simple OLS Equation Object:

2. From the Main menu, select Quick → Estimate


Equation.

1. From the Main menu, select


Object → New Object →
Equation. 25
The Equation Box

• In all cases, the Equation


Estimation box appears.
Specify your equation either
by:
a) List
You need to specify three b) Formula (explained in
future tutorials)
things in this dialogue box:
1.The equation specification. Specify your estimation
2.The estimation method. method

3.The sample.
Specify your sample

26
Specifying an Equation by List
• The easiest way to specify a linear equation is to provide a list of variables
that you wish to use in the equation.
• Suppose that you would like
to know how well income
explains financial wealth.
• To accomplish this, type in
the Equation Estimation box
1. The dependent variable
(wealth).
2. “c” for constant.
3. The independent variable
(income).
Notice that all the entries
are all separated by spaces.
27
Specifying Equation by List (cont’d)

• Alternatively, you can also create an Equation simply by


selecting the series and opening them as Equation.

To create an equation:
1.Select wealth and
income by clicking on
these series in the workfile
(press CTRL to select
multiple series). Notice
that you need to select the
dependent variable
(wealth) first.
2.Right click and select
Open → as Equation. 28
Specifying Equation by List (cont’d)

3.The Equation Estimation dialog box opens, listing your


independent, dependent variables, and the constant.
4.Click OK to estimate regression.

29
Estimation Method
• After you specify the variable list,
you need to select an estimation
method.
• Click on the Method option, and
you see a drop-down menu listing
the various estimation method you
can perform in EViews.
• Standard, single equation
regression, is performed using
“Least Squares” (LS).
• In this tutorial we will use Least
Squares and defer discussion of
more advanced estimation
techniques in subsequent tutorials.
30
Estimation Sample
• The third item you need to specify in the equation box is the Sample.
• You can specify the sample period in the sample space of the equation box.
For example, to estimate the following regression over the entire sample:
1. You need to include all observations (1 to 9275).
2. Click OK to estimate the regression. As seen in Equation Output, EViews has
included all observations when estimating this regression.

32
Estimation Sample (cont’d)

• What if you want to estimate the effect of income on wealth, but only for a
subset of individuals, e.g., married men?
To target a specific sample you need to:
1. Specify the sample: in this case male and married so type: if married=1 and male=1.
2. Click OK to estimate the regression. As seen in Equation Output, EViews has included
only a subset of total observations (534 obs.)

33
SIMPLE LINEAR REGRESSION
Dependent Variable: WEALTH
Method: Least Squares
Date: 04/05/25 Time: 10:44  Start with the p-value of F-statistics
Sample: 1 9275
Included observations: 9275 p-value of F-statistics should be <.05 in order to
say that the whole model is significant.
Variable Coefficient Std. Error t-Statistic Prob.
* e.g. The F-statistics is 1532.385 with p-value of
C -20.17948 1.176434 -17.15309 0.0000 <.001, thus the whole model is significant.
INCOME 0.999911 0.025543 39.14569 0.0000

R-squared 0.141817 Mean dependent var 19.07168  Followed by Adjusted R-squared


Adjusted R-squared 0.141724 S.D. dependent var 63.96384 Adjusted R2 explain the degree of association
S.E. of regression 59.25813 Akaike info criterion 11.00190
Sum squared resid 32562380 Schwarz criterion 11.00344 between dependent and independent variable.
Log likelihood -51019.30 Hannan-Quinn criter. 11.00242 * e.g. The adjusted R2 is 0.141724 which means
F-statistic 1532.385 Durbin-Watson stat 1.939237 that 14.17% of the changes in wealth is due to the
Prob(F-statistic) 0.000000
changes in income.

Leading Innovations, Transforming Lives, Building the Nation


LINEAR MODEL
Relationship between one dependent & two or more independent variables is a
linear function
Depend Populat Slope/
ent ion Coefficient of
Variabl Y- independent
e interce variable
pt

Independent Error
Variable/ term
Predictor
SIMPLE LINEAR REGRESSION
Dependent Variable: WEALTH
Method: Least Squares
Date: 04/05/25 Time: 10:44
Sample: 1 9275
Included observations: 9275  Check the p-value (Prob) of t-statistic of
Variable Coefficient Std. Error t-Statistic Prob. the intercept (C) and interpret its
C -20.17948 1.176434 -17.15309 0.0000
coefficient
INCOME 0.999911 0.025543 39.14569 0.0000 p-value of t-statistics should be <.05 in order to
say that the coefficient of the intercept is
R-squared 0.141817 Mean dependent var 19.07168
Adjusted R-squared 0.141724 S.D. dependent var 63.96384 significant and included in the regression model.
S.E. of regression 59.25813 Akaike info criterion 11.00190 • e.g. The t-statistics of intercept (c) is -17.15309
Sum squared resid 32562380 Schwarz criterion 11.00344 with p-value of <.001, thus the coefficient of
Log likelihood -51019.30 Hannan-Quinn criter. 11.00242
F-statistic 1532.385 Durbin-Watson stat 1.939237 the intercept (-20.17948) is significant.
Prob(F-statistic) 0.000000 • This means that when income is zero, the
wealth of individual is equal to -20.17948

Leading Innovations, Transforming Lives, Building the Nation


SIMPLE LINEAR REGRESSION
Dependent Variable: WEALTH
Method: Least Squares
Date: 04/05/25 Time: 10:44
Sample: 1 9275
Included observations: 9275  Check the p-value (Prob) of t-statistic of
Variable Coefficient Std. Error t-Statistic Prob. the independent variable and interpret
C -20.17948 1.176434 -17.15309 0.0000
its coefficient
INCOME 0.999911 0.025543 39.14569 0.0000 p-value of t-statistics should be <.05 in order to
say that the coefficient of the IV is significant and
R-squared 0.141817 Mean dependent var 19.07168
Adjusted R-squared 0.141724 S.D. dependent var 63.96384 included in the regression model.
S.E. of regression 59.25813 Akaike info criterion 11.00190 • e.g. The t-statistics of intercept (c) is 39.14569
Sum squared resid 32562380 Schwarz criterion 11.00344 with p-value of <.001, thus the coefficient of
Log likelihood -51019.30 Hannan-Quinn criter. 11.00242
F-statistic 1532.385 Durbin-Watson stat 1.939237 the income (0.99911) is significant.
Prob(F-statistic) 0.000000 • This means that for every one $1 increase in
income, there is corresponding increase of
0.999911 centavos in wealth.

Leading Innovations, Transforming Lives, Building the Nation


LINEAR MODEL
Dependent Variable: WEALTH
Method: Least Squares
Date: 04/05/25 Time: 10:44
Sample: 1 9275
Included observations: 9275

Variable Coefficient Std. Error t-Statistic Prob.

C -20.17948 1.176434 -17.15309 0.0000


INCOME 0.999911 0.025543 39.14569 0.0000

R-squared 0.141817 Mean dependent var 19.07168


Adjusted R-squared 0.141724 S.D. dependent var 63.96384
S.E. of regression 59.25813 Akaike info criterion 11.00190
Sum squared resid 32562380 Schwarz criterion 11.00344
Log likelihood -51019.30 Hannan-Quinn criter. 11.00242
F-statistic 1532.385 Durbin-Watson stat 1.939237
Prob(F-statistic) 0.000000
Dependent Variable: AFFECTIVE_COM
Method: Least Squares
Date: 12/10/23 Time: 08:47
Sample: 1 184
Included observations: 184

Variable Coefficient Std. Error t-Statistic Prob.

C 3.095244 0.316778 9.771016 0.0000


AFFECTIVE_CYNICISM -0.220940 0.087684 -2.519742 0.0126
BEHAVIORAL_CYNICISM -0.082642 0.066187 -1.248619 0.2134
COGNITIVE_CYNICISM 0.377496 0.077939 4.843456 0.0000

R-squared 0.251872 Mean dependent var 3.769928


Adjusted R-squared 0.239403 S.D. dependent var 0.764792
S.E. of regression 0.666992 Akaike info criterion 2.049422
Sum squared resid 80.07813 Schwarz criterion 2.119312
Log likelihood -184.5469 Hannan-Quinn criter. 2.077750
F-statistic 20.20020 Durbin-Watson stat 2.141296
Prob(F-statistic) 0.000000

Affect com = 3.095 – 0.220Affective_cy + 0.377cognitive_cy + 0.667


Acom= 3.095- 0.220Acy+0.377Ccy+ 0.667
Dependent Variable: OVERALL_A
Method: Least Squares
Date: 12/09/23 Time: 10:02
Sample: 1 184
Included observations: 184
HAC standard errors & covariance (Bartlett kernel, Newey-West fixed
bandwidth = 5.0000)

Variable Coefficient Std. Error t-Statistic Prob.

C 3.095244 0.451116 6.861299 0.0000


OVERALL_AD -0.220940 0.123543 -1.788363 0.0754
OVERALL_BD -0.082642 0.093369 -0.885120 0.3773
OVERALL_CD 0.377496 0.096549 3.909874 0.0001

R-squared 0.251872 Mean dependent var 3.769928


Adjusted R-squared 0.239403 S.D. dependent var 0.764792
S.E. of regression 0.666992 Akaike info criterion 2.049422
Sum squared resid 80.07813 Schwarz criterion 2.119312
Log likelihood -184.5469 Hannan-Quinn criter. 2.077750
F-statistic 20.20020 Durbin-Watson stat 2.141296
Prob(F-statistic) 0.000000 Wald F-statistic 23.60263
Prob(Wald F-statistic) 0.000000

OVERALL_AD 1 0.65468920... -0.2454725...


LL= 1.7236 UL OVERALL_BD 0.65468920... 1 -0.2962268...
1.791 OVERALL... -0.2454725... -0.2962268... 1
Dependent Variable: AFFECTIVE_COM Dependent Variable: AFFECTIVE_COM Dependent Variable: AFFECTIVE_COM
Method: Quantile Regression (tau = 0.25) Method: Quantile Regression (Median) Method: Quantile Regression (tau = 0.75)
Date: 12/10/23 Time: 09:56 Date: 12/10/23 Time: 09:55 Date: 12/10/23 Time: 09:56
Sample: 1 184 Sample: 1 184 Sample: 1 184
ncluded observations: 184 Included observations: 184 Included observations: 184
Huber Sandwich Standard Errors & Covariance Huber Sandwich Standard Errors & Covariance Huber Sandwich Standard Errors & Covariance
Sparsity method: Kernel (Epanechnikov) using residuals Sparsity method: Kernel (Epanechnikov) using residuals Sparsity method: Kernel (Epanechnikov) using residuals
Bandwidth method: Hall-Sheather, bw=0.1183 Bandwidth method: Hall-Sheather, bw=0.17082 Bandwidth method: Hall-Sheather, bw=0.1183
Estimation successfully identifies unique optimal solution Estimation successfully identifies unique optimal solution Estimation successfully identifies unique optimal solution

Variable Coefficient Std. Error t-Statistic Prob. Variable Coefficient Std. Error t-Statistic Prob. Variable Coefficient Std. Error t-Statistic Prob.

C 3.523810 0.731100 4.819874 0.0000 C 2.750000 0.655948 4.192404 0.0000 C 3.624317 0.567325 6.388434 0.0000
AFFECTIVE_CYNICISM -0.279221 0.141666 -1.970976 0.0503 AFFECTIVE_CYNICISM -0.166667 0.145062 -1.148931 0.2521 AFFECTIVE_CYNICISM -0.316940 0.175897 -1.801853 0.0732
BEHAVIORAL_CYNICISM -0.326840 0.082393 -3.966851 0.0001 BEHAVIORAL_CYNICISM 1.82E-10 0.113380 1.60E-09 1.0000BEHAVIORAL_CYNICISM 0.087432 0.120578 0.725106 0.4693
COGNITIVE_CYNICISM 0.270563 0.181384 1.491659 0.1375 COGNITIVE_CYNICISM 0.416667 0.149772 2.782013 0.0060 COGNITIVE_CYNICISM 0.321038 0.100721 3.187410 0.0017

Pseudo R-squared 0.219526 Mean dependent var 3.769928 Pseudo R-squared 0.081092 Mean dependent var 3.769928Pseudo R-squared 0.111427 Mean dependent var 3.769928
Adjusted R-squared 0.206519 S.D. dependent var 0.764792 Adjusted R-squared 0.065777 S.D. dependent var 0.764792Adjusted R-squared 0.096617 S.D. dependent var 0.764792
S.E. of regression 0.878479 Objective 38.17817 S.E. of regression 0.677153 Objective 48.39583S.E. of regression 0.854605 Objective 37.39413
Quantile dependent var 3.166667 Restr. objective 48.91667 Quantile dependent var 4.000000 Restr. objective 52.66667Quantile dependent var 4.000000 Restr. objective 42.08333
Sparsity 1.950619 Quasi-LR statistic 58.72188 Sparsity 1.772560 Quasi-LR statistic 19.27532Sparsity 2.119847 Quasi-LR statistic 23.59519
Prob(Quasi-LR stat) 0.000000 Prob(Quasi-LR stat) 0.000240 Prob(Quasi-LR stat) 0.000030
PERFORMING THE REGRESSION ANALYSIS IN SPSS
1. Go to Analyze -> Regression
-> Linear
2. Click the Statistic-> Put
check mark the “Model Fit”,
“R Square Change”,
Collinearity Diagnostics and
“Durbin Watson” then
“Continue”.
3. Click the “Plots”-> transfer
the “*ZRESID” on the Y box
then “*ZPRED” on the X box
-> put check mark on
“Histogram” and “Normal
Probability plot” then click
“Continue”
4. Click “OK”
CHECKING THE ASSUMPTIONS

Linearity Normality Homoskedasticity


INTERPRETATION OF THE RESULT OUTPUT
Durbin-Watson determine if the model satisfies the assumption of independence of error/
residual. You need to refer to Durbin-Watson Table (
https://www.real-statistics.com/statistics-tables/durbin-watson-table/) and look for value od
Lower Limit (LL) and Upper Limit (UL). If the Durbin-Watson Result lies between the LL and
UP, then the model failed to satisfy the assumption. In this case, the D-W = 1.542 and looking
on the D-B table, with k= 1 and n=22, the LL= 0.997 and UL = 1.174, the derived D-W does not
lye between LL and UP thus is satisfies the assumption.

Adjusted R square indicates the percentage on how strong the association between DV and
IDV. In this case, 46% of the variation/ changes in Sales can be explained by Radio
Advertisement
If p-value is <.05, the null hypothesis of there is no significant
relationship/ association between the dependent variable and
independent variable/s is rejected, otherwise, failed to reject and no
association at all.
Note: If it is > to .05, no need to interpret the other parts
If p-value is <.05, the null hypothesis of the stated as that
“the derived coefficient in the model is not significant” is
rejected, otherwise, failed to reject and no association at all.
Note: If it is > to .05, no need to interpret the derived
coefficient (B)

 Intercept= 699.96 means that when the radio ad is equal to zero (0), the sales is equal to $699.96
 Slope/ Coefficient of Independent Variable = 12.162 means that for every one (1) unit increase in
radio ad, there is corresponding increase of $12.16 in sales.
MULTIPLE LINEAR REGRESSION

In multiple linear regression analysis, the same concept will be applied however, there
is one additional assumption need to satisfy; there is no multicollinearity exist in the
model.

STEPS OF MULTIPLE LINEAR REGRESSION ANALYSIS


Examine variation measures
Do residual analysis
Test parameter significance
Overall model
Portions of model
Individual coefficients
Test for multicollinearity
Multicollinearity Detecting and Remedy for
• High correlation between X
Multicollinearity
variables • Examine correlation matrix
• Coefficients measure combined • Correlations between pairs of X variables are
effect more than with Y variable
• Leads to unstable coefficients • Use VIF result
 VIF starts at 1 and has no upper limit
depending on X variables in
 VIF = 1, no correlation between the independent variable and
model the other variables
• Always exists; matter of degree  VIF exceeding 5 or 10 indicates high multicollinearity between
• Example: Using both total this independent variable and the others

number of print ads and news • Few remedies


• Obtain new sample data
paper ad as explanatory variables
• Eliminate one correlated X variable
in same model
Both Test manifest NO COLINNEARITY
EXIST

𝑆𝑎𝑙𝑒𝑠=1 2.032 𝑅𝑎𝑑𝑖𝑜 𝐴𝑑+245.23


THANK YOU 

You might also like