0% found this document useful (0 votes)
133 views

EDU6950 Advance Statistics in Education Assignment 2-Multiple Regression Analysis

This document provides instructions for a multiple regression analysis assignment using data from HATCO to analyze the relationship between product usage (the dependent variable) and seven independent variables relating to customers' perceptions of HATCO's performance. The assignment requires submitting a report analyzing this relationship and identifying factors that influence increased product usage. The document outlines the stages of a multiple regression analysis, including assessing assumptions, evaluating the regression model, and determining influential observations.

Uploaded by

khalifa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views

EDU6950 Advance Statistics in Education Assignment 2-Multiple Regression Analysis

This document provides instructions for a multiple regression analysis assignment using data from HATCO to analyze the relationship between product usage (the dependent variable) and seven independent variables relating to customers' perceptions of HATCO's performance. The assignment requires submitting a report analyzing this relationship and identifying factors that influence increased product usage. The document outlines the stages of a multiple regression analysis, including assessing assumptions, evaluating the regression model, and determining influential observations.

Uploaded by

khalifa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

EDU6950 Advance Statistics in Education

Assignment 2-Multiple Regression Analysis

In this second assignment, you are required to submit a report on the relationship
among the HATCO's seven independent variables (delivery speed, price level, price
flexibility, manufacturer image, overall service, sales force image and product quality)
on the product usage. Do submit your report before 26th October 2018. 167

Multiple Regression Analysis is a statistical technique used to analyze the relationship


between single metric dependent (criterion) variable and several metric
independent (predictor) variables. The objective is to use the independent variable
whose values are known to predict the single dependent value selected by researcher.
Each independent variable is weighted by the regression analysis procedure from set of
independent variables. The weight (variate) denotes the relative contribution of IVs to
the overall prediction and forms the regression variety, a linear combination of the
independent variables that best predicts the dependent variable.
To apply multiple regression analysis,
1. Both dependent variable and independent variables must be metric or
appropriately transformed.
2. Dependent variable and independents variables must be decided before deriving
the regression equation.

Stage 1: Objective of Multiple Regression


By using HATCO data in this study, the objective of multiple regression analysis is:
1. To predict the product usage levels of the customer based on their perceptions of
HATCO performance (Prediction).

2. To identify the factors that lead to increased product usage for application in
marketing campaign (Explanation).
To apply regression procedure, Usage Level (X9) was selected as dependent variable
(Y) to be predicted by independent variables representing perceptions of HATCO’s
performance. The following seven variables were included as independent variable
(after this will refer as predictor variables):
X1 Delivery Speed
X2 Price Level
X3 Price Flexibility
X4 Manufacturer Image
X5 Service
X6 Salesforce Image
X7 Product Quality

The relationship among seven predictor variables and usage level was assumed to be
statistical, not functional, because it involved perceptions of performance and may have
the levels of measurement error.

The predicted regression model can be stated as below:

(Predicted usage level) Y = b0 + b1X1 + b2X2 + b3X3 + b4X4 + b5X5 +b6X6 +b7X7
where

b0 = constant number
b1 = change in usage level associated with unit change of Delivery Speed
b2 = change in usage level associated with unit change of Price Level
b3 = change in usage level associated with unit change of Price Flexibility
b4 = change in usage level associated with unit change of Manufacturer Image
b5 = change in usage level associated with unit change of Overall Service
b6 = change in usage level associated with unit change of Salesforce Image
b7 = change in usage level associated with unit change of Product Quality
X1 = Delivery Speed
X2 = Price Level
X3 = Price Flexibility
X4 = Manufacturer Image
X5 = Overall Service
X6 = Salesforce Image
X7 = Product Quality

Stage 2: Research Design of a Multiple Regression Analysis


The HATCO survey obtained 100 respondents resulting in 100 observations. The first
question to be answered concerning sample size is the level of relationship ( R2 ¿ that
can be detected reliability with the proposed regression analysis.
Table 1 below indicates that sample of 100, with seven potential independent variable,
is able to detect relationship with R2 values of approximately of 17 percent (75-58) at
approximately power of .80 (.775) with the significance level set at .01. Maintaining
power at .80 in multiple regression requires a minimum sample of 50 and preferably 100
observations for most research situations.

Table 1
Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate

1 .880a .775 .758 4.42627

a. Predictors: (Constant), Product Quality, Overall Service, Salesforce


Image, Price flexibility, Price level, Manufacturer Image, Delivery Speed

There is no missing data due to complete responses from the respondents. Table 2
below show the descriptive data that show no missing value in this analysis.

Table 2

Descriptive Statistics

Mean Std. Deviation N

Usage Level 46.1000 8.98877 100


Delivery Speed 3.5150 1.32073 100
Price level 2.3740 1.20098 100
Price flexibility 7.8940 1.38650 100
Manufacturer Image 5.2480 1.13141 100
Overall Service 2.9160 .75126 100
Salesforce Image 2.6650 .77085 100
Product Quality 6.9710 1.58524 100

Minimum R2 That Can Be Found Statistically Significant with a Power of .80 for Varying
Numbers of Independent Variables (1), Sample Sizes (100), and Significance Level (α)
= .01. The ratio for this data is 13 observation: W1 variables approaching the desired
level of 15 observation per 1 variable. The ratio between observation and variables is
above the minimum requirement 5:1.
The proposed regression analysis was deemed sufficient to identify not only statistically
significant relationship but also relationship that had managerial significance because of
the adequate sample and no missing data.

Stage 3: Assumptions in Multiple Regression Analysis


The assumptions in Multiple Regression Analysis to be examined are in three areas:
a) Linearity of the phenomenon measured
b) Constant variance & Independence of the error terms
c) Normality of the error term distribution

a. Linearity that indicates relationship between predictor variable and dependent


variable. Linearity test were based in the scatter plot below in Table 3 below. The
scatter plots show linear relationship between each of the predictor variables to
the dependent variables. All predictor variables are have linear relationship
with the dependent variable.

Table 3

Dependent Variable : Usage Level Dependent Variable : Usage Level


(X9) (X9)
Independent Variable : Delivery Speed Independent Variable : Price Level
(X1) (X2)
Positive Linear Relationship Positive Linear Relationship
Dependent Variable : Usage Level Dependent Variable : Usage Level
(X9) (X9)
Independent Variable : Price Flexibility Independent Variable : Manufacturer
(X3) Image (X4)
Positive Linear Relationship Positive Linear Relationship

Dependent Variable : Usage Level Dependent Variable : Usage Level


(X9) (X9)
Independent Variable : Service (X5) Independent Variable : Salesforce
Positive Linear Relationship Image (X6)
Positive Linear Relationship

Dependent Variable : Usage Level


(X9)
Independent Variable : Product Quality
(X7) Positive Linear Relationship

b. Homoscedasticity is the description of data for which the variance of the error
terms (e) appears constant over the range of values of a predictor variables.
Based on the scatterplot, all predictor variables show homoscedasticity.

c. Normality

There are three ways to do with normality:

1. The simplest diagnostic for the set of independent variables in the equation is
a histogram of residuals, with a visual check for a distribution approximating
the normal distribution. (See Histogram below)
2. The better method is the use of normal probability plots.
3. Normality test for HATCO data using Shapiro Wilk test for normality especially
for the small sample. Table 4 show Shapiro Wilk data for normality testing.

Table 4

Shapiro-Wilk
Statistic df Sig.
Delivery Speed .985 100 .341
Price Level .969 100 .018
Price Flexibility .950 100 .001
Manufacturer Image .982 100 .183
Overall Service .986 100 .366
Salesforce Image .963 100 .007
Product Quality .971 100 .028
Usage Level .985 100 .320
Satisfaction Level .977 100 .074

The variable show the normal distribution data if the p value is not significant (sig value
>.05). From the Table 4, we can see that all variables show normal distribution of
data except for Price Flexibility.
Stage 4: Assessing the Regression Model and Assessing
Overall Model Fit
In this stage have to accomplish three basic tasks:
1. Select a method for specifying the regression model to be estimated.
2. Assess the statistical significance of the overall model in predicting the
dependent variable.
3. Determine whether any of the observations exert an undue influence on the
results.
For this analysis, I will use Stepwise Estimation (under SEQUENTIAL SEARCH
METHODS). Stepwise Estimation is a method of selecting variables for inclusion in the
regression model that start by selecting the best predictors of the dependent variable.
Additional predictor variable are selected in terms of the incremental exploratory power
they can add to the regression model.

Stage 5: Interpreting the Regression Variate


Using the Regression Coefficients (See Table 5 below)

Table 5
Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -10.580 4.917 -2.152 .034

Delivery Speed .665 1.815 .098 .366 .715

Price level .087 1.862 .012 .047 .963

Price flexibility 3.391 .409 .523 8.301 .000

Manufacturer Image -.008 .668 -.001 -.012 .991

Overall Service 6.935 3.520 .580 1.970 .052

Salesforce Image 1.262 .951 .108 1.327 .188

Product Quality .548 .356 .097 1.540 .127

a. Dependent Variable: Usage Level


Three from seven predictor variables: service, price flexibility and saleforce image were
include in the regression model with p ≤ .05 based on Table 6

Table 6
Variables Entered/Removeda

Variables Variables
Model Entered Removed Method

1 Stepwise
(Criteria:
Probability-of-F-
Service . to-enter <= .050,
Probability-of-F-
to-remove >= .
100).
2 Stepwise
(Criteria:
Probability-of-F-
Price Flexibility . to-enter <= .050,
Probability-of-F-
to-remove >= .
100).
3 Stepwise
(Criteria:
Probability-of-F-
Salesforce
. to-enter <= .050,
Image
Probability-of-F-
to-remove >= .
100).

a. Dependent Variable: Usage Level

Based on Table 7 below, correlation between usage levels (dependent variable) with
service (predictor variable) is .880, and with addition on price flexibility, the correlation
become .86. Combining all three predictor variables, Service, Price Flexibility and
Salesforce Image, the correlation increase to .877.
R square value indicate that 49.1 %change in usage level is based on service, 26.4%
from price flexibility and 0.13% is from Salesforce Image.

Table 7

Model Summaryd

Std. Error of the


Model R R Square Adjusted R Square Estimate
a
1 .701 .491 .486 6.44576
b
2 .869 .755 .750 4.49798
c
3 .877 .768 .761 4.39377

a. Predictors: (Constant), Overall Service


b. Predictors: (Constant), Overall Service, Price flexibility
c. Predictors: (Constant), Overall Service, Price flexibility, Salesforce Image

d. Dependent Variable: Usage Level

ANOVA results in Table 8 show that there is significant relationship between three
variables (Overall Service, Price Flexibility and Salesforce image) with Usage Level
(dependent variables) with significant value of p < 0.5.
For service, the results is significant [F (1, 98) = 94.525, p < 0.5], while Price Flexibility
the results is [F (2, 97) = 149.184, p < 0.5], and lastly for the Salesforce Image, the
significant results is [F (3, 96) = 106.115, p < 0.5],
Table 8

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 3927.309 1 3927.309 94.525 .000b

Residual 4071.691 98 41.548

Total 7999.000 99
2 Regression 6036.513 2 3018.256 149.184 .000c
Residual 1962.487 97 20.232
Total 7999.000 99
3 Regression 6145.700 3 2048.567 106.115 .000d

Residual 1853.300 96 19.305


Total 7999.000 99

a. Dependent Variable: Usage Level


b. Predictors: (Constant), Overall Service
c. Predictors: (Constant), Overall Service, Price flexibility
d. Predictors: (Constant), Overall Service, Price flexibility, Salesforce Image

B constants regression value in Table 9 for three predictor variables in linear equation.
T test show significant result p < 0.5.
Table 9

Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.


Coefficients
B Std. Error Beta

1 (Constant) 21.653 2.596 8.341 .000

Overall Service 8.384 .862 .701 9.722 .000


2 (Constant) -3.489 3.057 -1.141 .257
Overall Service 7.974 .603 .666 13.221 .000
Price flexibility 3.336 .327 .515 10.210 .000
3 (Constant) -6.520 3.247 -2.008 .047

Overall Service 7.621 .607 .637 12.547 .000

Price flexibility 3.376 .320 .521 10.562 .000

Salesforce Image 1.406 .591 .121 2.378 .019

a. Dependent Variable: Usage Level

T test result in Table 10 show the effect of predictor variables in linear combination that
is not significant to dependent variable, that resulting them not to be included in
regression model. Some of the variables have small Beta In value causing them been
eliminated in regression model. Collieniarity tolerance < 2.0 show that the data don’t
have any Collinearity problems, meaning that no predictor variables are highly
correlated.

Table 10
Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 23.3730 60.5919 46.1000 7.87895 100


Std. Predicted Value -2.885 1.839 .000 1.000 100
Standard Error of Predicted
.467 1.429 .847 .236 100
Value
Adjusted Predicted Value 23.1805 60.3876 46.1043 7.91480 100
Residual -12.55201 7.57363 .00000 4.32668 100
Std. Residual -2.857 1.724 .000 .985 100
Stud. Residual -2.983 1.737 .000 1.004 100
Deleted Residual -13.68695 7.69356 -.00429 4.49699 100
Stud. Deleted Residual -3.115 1.756 -.004 1.017 100
Mahal. Distance .129 9.485 2.970 2.185 100
Cook's Distance .000 .201 .010 .022 100
Centered Leverage Value .001 .096 .030 .022 100

a. Dependent Variable: Usage Level


Standard residual value in the range of ± 3.3 indicates that research data don’t have the
problem with extreme value (outlier) and fulfill the multiple regression requirements.

Stage 7: Validation of the Results


Data analysis show that among the population, three variables which is a) Overall
Service, b) Price Flexibility, and c) Salesforce Image, are the predictor for product
usage levels of the customer as factors to measure customer product usage level.
Proposed regression model is
Y = b0 + b3X3 + b5X5 +b6X6
Written as
Y = -6.520 + 3.376X3 + 7.621X5 +1.406X6
Where

b0 = constant number
b3 = change in usage level associated with unit change of Price Flexibility
b5 = change in usage level associated with unit change of Overall Service
b6 = change in usage level associated with unit change of Salesforce Image
X3 = Price Flexibility
X5 = Overall Service
X6 = Salesforce Image

Significantly, Service [F (1, 98) = 94.525, p < 0.5] contribute 49.1 % variance (R2 = .
491) to customer product usage level. That’s mean Service is the primary predictor to
usage level. Combination of service and price flexibility [F (2, 97) = 149.184, p < 0.5],
will increase the variance to 75.5% and the combination of Overall Service, Price
Flexibility and Salesforece Image [F (3, 96) = 106.115, p < 0.5] will contribute to 76.8%
variance to product usage level.
Based on the analysis above, HATCO Company should enhance their quality on service
in their next campaign as service is the main factor for customer usage level. Other
than that, price flexibility and Salesforce Image can be considered as the factors to be
highlighted in next marketing campaign.

You might also like