0% found this document useful (0 votes)
24 views48 pages

BRM-Lecture 4-2023

The document discusses various statistical methods for analyzing the relationships between multiple variables, including correlations, regression, and how correlations do not necessarily imply causation; it provides examples of using multiple regression to analyze relationships between independent and dependent variables and interpret regression outputs; and it explains assumptions that must be met when using regression to generalize results from a sample to the target population.

Uploaded by

sharma.divya0598
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views48 pages

BRM-Lecture 4-2023

The document discusses various statistical methods for analyzing the relationships between multiple variables, including correlations, regression, and how correlations do not necessarily imply causation; it provides examples of using multiple regression to analyze relationships between independent and dependent variables and interpret regression outputs; and it explains assumptions that must be met when using regression to generalize results from a sample to the target population.

Uploaded by

sharma.divya0598
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Business Research Methods

Dr Teidorlang Lyngdoh
Associate Professor-Marketing
Session- 4
Today’s Outline
• In class exercise
• Correlations & Regression
What is a Correlation?
• It is a way of measuring the extent to which two variables are
related.
• It measures the pattern of responses across variables.
• The correlation coefficient computed from the sample data
measures the strength and direction of a relationship between two
variable
• Sample correlation coefficient is denoted by r.
• Population correlation, often denoted as ρ
• Correlation does NOT necessarily imply causation
Correlation & Causation
• Causation means cause & effect relation
• Correlation denotes the interdependency among the variables for
correlating two phenomenon, it is essential that the two
phenomenon should have relationship but may not be cause-
effect.
• If two variables vary in such a way that change in one (cause) are
accompanied by change in other (effect) having all other factors
that can make move the ‘effect’, constant, then these two
variables are said to have cause and effect relationship.
• In other words causation always implies correlation but correlation
does not always imply causation.
Very Small Relationship
160

140
Appreciation of Dimmu Borgir

120

100

80

60

40

20

-20
10 20 30 40 50 60 70 80 90

Age
Positive Relationship 90

80

Appreciation of Dimmu Borgir 70

60

50

40

30

20

10
10 20 30 40 50 60 70 80 90

Age
Range of Values for the Correlation Coefficient
Variance vs Co- Variance
• Variance measures the spread or • Covariance measures the degree to
dispersion of a single random which two random variables change
variable. It quantifies how much together. It quantifies the relationship
individual data points in a dataset between two variables.
deviate from the mean or expected • If the variables tend to increase or
value. decrease together, the covariance is
• Variance is useful for understanding positive; if one increases while the
the "spread" or variability in a dataset other decreases, the covariance is
and is often used to calculate the negative; if they're unrelated, the
standard deviation, which is the covariance is close to zero.
square root of the variance.
Pearson’s R
• Covariance does not really tell us much about the strength of
association (Solution: standardise this measure)
• Correlation is a standardized measure of the linear relationship
between two variables, derived from covariance. It always falls within
the range of -1 (perfect negative correlation) to 1 (perfect positive
correlation).
Correlation: Example
• Anxiety and Exam Performance
• Participants:
• 103 students
• Measures
• Time spent revising (hours)
• Exam performance (%)
• Exam Anxiety (the EAQ, score out of 100)
• Gender
Correlations in SPSS
Correlations Output
Correlations in SPSS
• Check out the correlations between the different variables in the
Benetton data set

Multiple Regression
Multiple Regression
• Independent Variables- IV (Career Limitations & Experience)
• Dependent Variables- DV ( Days until employed)
Forecasting is like trying to drive a car
blindfolded and following directions given
by a person who is looking out the back
window.
Examples
• Insurance companies heavily rely on regression analysis to estimate
the credit standing of policyholders and a possible number of claims
in a given time period
• A retail store manager may believe that extending shopping hours will
greatly increase sales. RA, however, may indicate that the increase in
revenue might not be sufficient to support the rise in operating
expenses due to longer working hours (such as additional employee
labor charges)
• analysis of data from point of sales systems and purchase accounts
may highlight market patterns like increase in demand on certain days
of the week or at certain times of the year
Regression
• Simple regression: Y = b0 + b1 x.
• Multiple regression: Y = b0 + b1 x1 + b0 + b1 x2…b0…b1 xn.
Multiple Regression as an Equation
• With multiple regression the relationship is described using a
variation of the equation of a straight line.

y = b0 + b1 X 1 +b2 X 2 +  + bn X n +  i
Methods of Regression
• Hierarchical:
• Experimenter decides the order in which variables are entered
into the model.
• Forced Entry:
• All predictors are entered simultaneously.
• Stepwise:
• Predictors are selected using their semi-partial correlation with
the outcome.
b0 is the intercept. Beta Values

• The intercept is the value of • b1 is the regression coefficient for


the Y variable when all Xs = 0. variable 1.
• This is the point at which the • b2 is the regression coefficient for
regression plane crosses the variable 2.
Y-axis (vertical). • bn is the regression coefficient for
nth variable.
SPSS & Out Interpretation
Multiple Regression Interpretation
Significance .000 ( statistically
Adjusted R Square- .237 ( It means 23.7 significant) When we talk about a
percent of the variance of DV is significance level of 0.05, we're
explained by the IV) referring to a threshold that helps us
decide whether a result or finding is
"statistically significant"
Significance .000 ( statistically
significant)

If the p-value is less than 0.05 (your significance level), you say your results are "statistically significant." This means
there's strong evidence to reject the null hypothesis and support the alternative hypothesis. In simple terms, you
believe that something interesting is indeed happening.
1) Independent
Variables (CL &
Experience) 3) Unstandardized coefficients of the IV
2) Significant levels of IV (CL & CL 2.658 what it means is that as CL index
Experience) p-values are less than increase by a value of 1, 1 unit change in CL we
.05 so statistically significance are going to see a 2.658 unit change in the DV
contributions from both the IV
Ex -4.044 As experience increase by 1 year, no of
days on employed (DV) decrease by 4. More
experience smaller no of days of unemployment
Std deviations- For every 1 std deviation
movement, the DV increases by 2.33 std
deviation There is a 95% chance that the actual
value of unstandardized coefficient is
As experience increases by 1 std deviation, between .671 and 4.644
we have a decrease in DV by -4.36 std
deviation
Generalization
• When we run regression, we hope to be able to
generalize the sample model to the entire population.
• To do this, several assumptions must be met.
• Violating these assumptions stops us generalizing
conclusions to our target population.
Straight forward Assumptions
• Variable Type:
• Outcome must be continuous
• Predictors can be continuous or dichotomous.
• Non-Zero Variance:
• Predictors must not have zero variance.
• Linearity:
• The relationship we model is, in reality, linear.
• Independence:
• All values of the outcome should come from a different person.
The More Tricky Assumptions
• No Multicollinearity:
• Predictors must not be highly correlated.
• Homoscedasticity:
• For each value of the predictors the variance of the error term should be
constant.
• Independent Errors:
• For any pair of observations, the error terms should be uncorrelated.
• Normally-distributed Errors
More Explanations of Regression- self
learning
Hierarchical Regression Forced Entry Regression
• Known predictors (based on
past research) are entered • All variables are entered into the
into the regression model model simultaneously.
first. • The results obtained depend on the
• New predictors are then variables entered into the model.
entered in a separate
step/block. • It is important, therefore, to have
• Experimenter makes the good theoretical reasons for
decisions including a particular variable.
• You can see the unique
predictive influence of a new
variable on the outcome
because known predictors are
held constant in the model
Stepwise Regression I

• Variables are entered into the Step 2:


model based on mathematical
criteria. • Having selected the 1st predictor, a second
one is chosen from the remaining
• Computer selects variables in predictors.
steps.
• Step 1 • The semi-partial correlation is used as a
• SPSS looks for the predictor that criterion for selection.
can explain the most variance in
the outcome variable. • Should be used only for exploration
Linearity
Outliers
Homoscedasticity
Homoscedasticity
• It refers to the condition where the variance of the errors (or
residuals) of a regression model is constant across all levels of the
independent variables.
• In simpler terms, it means that the spread or dispersion of the
residuals should be roughly the same for all values of the predictor
variables.
Multicollinearity
• Multicollinearity is a statistical concept where several independent
variables in a model are correlated
• Multicollinearity among independent variables will result in less
reliable statistical inferences.
• Multicollinearity makes it challenging to understand the individual
effect of each predictor variable because they are highly correlated
• This assumption can be checked with collinearity diagnostics
• Tolerance should be more than 0.2
(Menard, 1995)
• VIF should be less than 10 (Myers, 1990)
Autocorrelation

• Autocorrelation is a measure of the similarity between data points at


different time lags. It helps us understand whether there's a pattern
or relationship between observations at different time points.

• The Durbin Watson test reports a test statistic, with a value from 0 to
4, where:
• 2 is no autocorrelation.
• 0 to <2 is positive autocorrelation (common in time series data).
• >2 to 4 is negative autocorrelation (less common in time series data).
Checking Assumptions about Errors
• Homoscedacity/Independence of Errors:
• Plot ZRESID against ZPRED.
• Normality of Errors:
• Normal probability plot.
Regression Plots
Homoscedasticity: ZRESID vs. ZPRED
Normality of Errors: Histograms
and P-P plots
48

You might also like