BRM-Lecture 4-2023
BRM-Lecture 4-2023
Dr Teidorlang Lyngdoh
Associate Professor-Marketing
Session- 4
Today’s Outline
• In class exercise
• Correlations & Regression
What is a Correlation?
• It is a way of measuring the extent to which two variables are
related.
• It measures the pattern of responses across variables.
• The correlation coefficient computed from the sample data
measures the strength and direction of a relationship between two
variable
• Sample correlation coefficient is denoted by r.
• Population correlation, often denoted as ρ
• Correlation does NOT necessarily imply causation
Correlation & Causation
• Causation means cause & effect relation
• Correlation denotes the interdependency among the variables for
correlating two phenomenon, it is essential that the two
phenomenon should have relationship but may not be cause-
effect.
• If two variables vary in such a way that change in one (cause) are
accompanied by change in other (effect) having all other factors
that can make move the ‘effect’, constant, then these two
variables are said to have cause and effect relationship.
• In other words causation always implies correlation but correlation
does not always imply causation.
Very Small Relationship
160
140
Appreciation of Dimmu Borgir
120
100
80
60
40
20
-20
10 20 30 40 50 60 70 80 90
Age
Positive Relationship 90
80
60
50
40
30
20
10
10 20 30 40 50 60 70 80 90
Age
Range of Values for the Correlation Coefficient
Variance vs Co- Variance
• Variance measures the spread or • Covariance measures the degree to
dispersion of a single random which two random variables change
variable. It quantifies how much together. It quantifies the relationship
individual data points in a dataset between two variables.
deviate from the mean or expected • If the variables tend to increase or
value. decrease together, the covariance is
• Variance is useful for understanding positive; if one increases while the
the "spread" or variability in a dataset other decreases, the covariance is
and is often used to calculate the negative; if they're unrelated, the
standard deviation, which is the covariance is close to zero.
square root of the variance.
Pearson’s R
• Covariance does not really tell us much about the strength of
association (Solution: standardise this measure)
• Correlation is a standardized measure of the linear relationship
between two variables, derived from covariance. It always falls within
the range of -1 (perfect negative correlation) to 1 (perfect positive
correlation).
Correlation: Example
• Anxiety and Exam Performance
• Participants:
• 103 students
• Measures
• Time spent revising (hours)
• Exam performance (%)
• Exam Anxiety (the EAQ, score out of 100)
• Gender
Correlations in SPSS
Correlations Output
Correlations in SPSS
• Check out the correlations between the different variables in the
Benetton data set
•
Multiple Regression
Multiple Regression
• Independent Variables- IV (Career Limitations & Experience)
• Dependent Variables- DV ( Days until employed)
Forecasting is like trying to drive a car
blindfolded and following directions given
by a person who is looking out the back
window.
Examples
• Insurance companies heavily rely on regression analysis to estimate
the credit standing of policyholders and a possible number of claims
in a given time period
• A retail store manager may believe that extending shopping hours will
greatly increase sales. RA, however, may indicate that the increase in
revenue might not be sufficient to support the rise in operating
expenses due to longer working hours (such as additional employee
labor charges)
• analysis of data from point of sales systems and purchase accounts
may highlight market patterns like increase in demand on certain days
of the week or at certain times of the year
Regression
• Simple regression: Y = b0 + b1 x.
• Multiple regression: Y = b0 + b1 x1 + b0 + b1 x2…b0…b1 xn.
Multiple Regression as an Equation
• With multiple regression the relationship is described using a
variation of the equation of a straight line.
y = b0 + b1 X 1 +b2 X 2 + + bn X n + i
Methods of Regression
• Hierarchical:
• Experimenter decides the order in which variables are entered
into the model.
• Forced Entry:
• All predictors are entered simultaneously.
• Stepwise:
• Predictors are selected using their semi-partial correlation with
the outcome.
b0 is the intercept. Beta Values
If the p-value is less than 0.05 (your significance level), you say your results are "statistically significant." This means
there's strong evidence to reject the null hypothesis and support the alternative hypothesis. In simple terms, you
believe that something interesting is indeed happening.
1) Independent
Variables (CL &
Experience) 3) Unstandardized coefficients of the IV
2) Significant levels of IV (CL & CL 2.658 what it means is that as CL index
Experience) p-values are less than increase by a value of 1, 1 unit change in CL we
.05 so statistically significance are going to see a 2.658 unit change in the DV
contributions from both the IV
Ex -4.044 As experience increase by 1 year, no of
days on employed (DV) decrease by 4. More
experience smaller no of days of unemployment
Std deviations- For every 1 std deviation
movement, the DV increases by 2.33 std
deviation There is a 95% chance that the actual
value of unstandardized coefficient is
As experience increases by 1 std deviation, between .671 and 4.644
we have a decrease in DV by -4.36 std
deviation
Generalization
• When we run regression, we hope to be able to
generalize the sample model to the entire population.
• To do this, several assumptions must be met.
• Violating these assumptions stops us generalizing
conclusions to our target population.
Straight forward Assumptions
• Variable Type:
• Outcome must be continuous
• Predictors can be continuous or dichotomous.
• Non-Zero Variance:
• Predictors must not have zero variance.
• Linearity:
• The relationship we model is, in reality, linear.
• Independence:
• All values of the outcome should come from a different person.
The More Tricky Assumptions
• No Multicollinearity:
• Predictors must not be highly correlated.
• Homoscedasticity:
• For each value of the predictors the variance of the error term should be
constant.
• Independent Errors:
• For any pair of observations, the error terms should be uncorrelated.
• Normally-distributed Errors
More Explanations of Regression- self
learning
Hierarchical Regression Forced Entry Regression
• Known predictors (based on
past research) are entered • All variables are entered into the
into the regression model model simultaneously.
first. • The results obtained depend on the
• New predictors are then variables entered into the model.
entered in a separate
step/block. • It is important, therefore, to have
• Experimenter makes the good theoretical reasons for
decisions including a particular variable.
• You can see the unique
predictive influence of a new
variable on the outcome
because known predictors are
held constant in the model
Stepwise Regression I
• The Durbin Watson test reports a test statistic, with a value from 0 to
4, where:
• 2 is no autocorrelation.
• 0 to <2 is positive autocorrelation (common in time series data).
• >2 to 4 is negative autocorrelation (less common in time series data).
Checking Assumptions about Errors
• Homoscedacity/Independence of Errors:
• Plot ZRESID against ZPRED.
• Normality of Errors:
• Normal probability plot.
Regression Plots
Homoscedasticity: ZRESID vs. ZPRED
Normality of Errors: Histograms
and P-P plots
48